Big data
Luca Massaron, Bojan Tunguz, Konrad Banachewicz, Anthony...
Kaggle has become the proving ground for millions of data enthusiasts worldwide, offering what no classroom tutorial can match: battle-tested skills built through real-world challenges and the hands-on experience that employers seek. Every competition sharpens your data analysis skills, expands your network within the data scientist community, and gives compelling proof of expertise to unlock career opportunities.The first book of its kind, The Kaggle Book brings together everything you need to excel in competitions, data science projects, and beyond. This new edition includes fresh content and new chapters on Kaggle Models, time series, and Generative AI competitions, with three Kaggle Grandmasters guiding you through modeling strategies and sharing hard-earned insights accumulated over years of competition.The book extends far past competition tactics, revealing techniques for tackling image, tabular, and textual data as well as reinforcement learning tasks. You’ll also discover tips for designing better validation schemes and working confidently with both standard and unconventional evaluation metrics.Whether you want to climb the Kaggle leaderboard, accelerate your data science career, or improve the accuracy of your models, this book is for you.Join our Discord community of over 1,000 members to learn, share, and grow together!
Konrad Banachewicz, Luca Massaron
More than 80,000 Kaggle novices currently participate in Kaggle competitions. To help them navigate the often-overwhelming world of Kaggle, two Grandmasters put their heads together to write The Kaggle Book, which made plenty of waves in the community. Now, they’ve come back with an even more practical approach based on hands-on exercises that can help you start thinking like an experienced data scientist.In this book, you’ll get up close and personal with four extensive case studies based on past Kaggle competitions. You’ll learn how bright minds predicted which drivers would likely avoid filing insurance claims in Brazil and see how expert Kagglers used gradient-boosting methods to model Walmart unit sales time-series data. Get into computer vision by discovering different solutions for identifying the type of disease present on cassava leaves. And see how the Kaggle community created predictive algorithms to solve the natural language processing problem of subjective question-answering.You can use this workbook as a supplement alongside The Kaggle Book or on its own alongside resources available on the Kaggle website and other online communities. Whatever path you choose, this workbook will help make you a formidable Kaggle competitor.
David Ping
David Ping, Head of GenAI and ML Solution Architecture for global industries at AWS, provides expert insights and practical examples to help you become a proficient ML solutions architect, linking technical architecture to business-related skills.You'll learn about ML algorithms, cloud infrastructure, system design, MLOps , and how to apply ML to solve real-world business problems. David explains the generative AI project lifecycle and examines Retrieval Augmented Generation (RAG), an effective architecture pattern for generative AI applications. You’ll also learn about open-source technologies, such as Kubernetes/Kubeflow, for building a data science environment and ML pipelines before building an enterprise ML architecture using AWS. As well as ML risk management and the different stages of AI/ML adoption, the biggest new addition to the handbook is the deep exploration of generative AI.By the end of this book , you’ll have gained a comprehensive understanding of AI/ML across all key aspects, including business use cases, data science, real-world solution architecture, risk management, and governance. You’ll possess the skills to design and construct ML solutions that effectively cater to common use cases and follow established ML architecture patterns, enabling you to excel as a true professional in the field.
Hyatt Saleh
Machine learning algorithms are an integral part of almost all modern applications. To make the learning process faster and more accurate, you need a tool flexible and powerful enough to help you build machine learning algorithms quickly and easily. With The Machine Learning Workshop, you'll master the scikit-learn library and become proficient in developing clever machine learning algorithms.The Machine Learning Workshop begins by demonstrating how unsupervised and supervised learning algorithms work by analyzing a real-world dataset of wholesale customers. Once you've got to grips with the basics, you'll develop an artificial neural network using scikit-learn and then improve its performance by fine-tuning hyperparameters. Towards the end of the workshop, you'll study the dataset of a bank's marketing activities and build machine learning models that can list clients who are likely to subscribe to a term deposit. You'll also learn how to compare these models and select the optimal one.By the end of The Machine Learning Workshop, you'll not only have learned the difference between supervised and unsupervised models and their applications in the real world, but you'll also have developed the skills required to get started with programming your very own machine learning algorithms.
Rohan Chopra, Aniruddha M. Godbole, Nipun Sadvilkar,...
Do you want to learn how to communicate with computer systems using Natural Language Processing (NLP) techniques, or make a machine understand human sentiments? Do you want to build applications like Siri, Alexa, or chatbots, even if you’ve never done it before?With The Natural Language Processing Workshop, you can expect to make consistent progress as a beginner, and get up to speed in an interactive way, with the help of hands-on activities and fun exercises.The book starts with an introduction to NLP. You’ll study different approaches to NLP tasks, and perform exercises in Python to understand the process of preparing datasets for NLP models. Next, you’ll use advanced NLP algorithms and visualization techniques to collect datasets from open websites, and to summarize and generate random text from a document. In the final chapters, you’ll use NLP to create a chatbot that detects positive or negative sentiment in text documents such as movie reviews.By the end of this book, you’ll be equipped with the essential NLP tools and techniques you need to solve common business problems that involve processing text.
Blaine Bateman, Saikat Basak, Thomas V. Joseph,...
The Pandas Workshop will teach you how to be more productive with data and generate real business insights to inform your decision-making. You will be guided through real-world data science problems and shown how to apply key techniques in the context of realistic examples and exercises. Engaging activities will then challenge you to apply your new skills in a way that prepares you for real data science projects.You’ll see how experienced data scientists tackle a wide range of problems using data analysis with pandas. Unlike other Python books, which focus on theory and spend too long on dry, technical explanations, this workshop is designed to quickly get you to write clean code and build your understanding through hands-on practice. As you work through this Python pandas book, you’ll tackle various real-world scenarios, such as using an air quality dataset to understand the pattern of nitrogen dioxide emissions in a city, as well as analyzing transportation data to improve bus transportation services.By the end of this data analytics book, you’ll have the knowledge, skills, and confidence you need to solve your own challenging data science problems with pandas.
Liu Peng
The Statistics and Machine Learning with R Workshop is a comprehensive resource packed with insights into statistics and machine learning, along with a deep dive into R libraries. The learning experience is further enhanced by practical examples and hands-on exercises that provide explanations of key concepts.Starting with the fundamentals, you’ll explore the complete model development process, covering everything from data pre-processing to model development. In addition to machine learning, you’ll also delve into R's statistical capabilities, learning to manipulate various data types and tackle complex mathematical challenges from algebra and calculus to probability and Bayesian statistics. You’ll discover linear regression techniques and more advanced statistical methodologies to hone your skills and advance your career.By the end of this book, you'll have a robust foundational understanding of statistics and machine learning. You’ll also be proficient in using R's extensive libraries for tasks such as data processing and model training and be well-equipped to leverage the full potential of R in your future projects.
Blaine Bateman, Ashish Ranjan Jha, Benjamin Johnston,...
Would you like to understand how and why machine learning techniques and data analytics are spearheading enterprises globally? From analyzing bioinformatics to predicting climate change, machine learning plays an increasingly pivotal role in our society.Although the real-world applications may seem complex, this book simplifies supervised learning for beginners with a step-by-step interactive approach. Working with real-time datasets, you’ll learn how supervised learning, when used with Python, can produce efficient predictive models.Starting with the fundamentals of supervised learning, you’ll quickly move to understand how to automate manual tasks and the process of assessing date using Jupyter and Python libraries like pandas. Next, you’ll use data exploration and visualization techniques to develop powerful supervised learning models, before understanding how to distinguish variables and represent their relationships using scatter plots, heatmaps, and box plots. After using regression and classification models on real-time datasets to predict future outcomes, you’ll grasp advanced ensemble techniques such as boosting and random forests. Finally, you’ll learn the importance of model evaluation in supervised learning and study metrics to evaluate regression and classification tasks.By the end of this book, you’ll have the skills you need to work on your real-life supervised learning Python projects.