Видавець: Packt Publishing
Stephen Klosterman
If data is the new oil, then machine learning is the drill. As companies gain access to ever-increasing quantities of raw data, the ability to deliver state-of-the-art predictive models that support business decision-making becomes more and more valuable.In this book, you’ll work on an end-to-end project based around a realistic data set and split up into bite-sized practical exercises. This creates a case-study approach that simulates the working conditions you’ll experience in real-world data science projects.You’ll learn how to use key Python packages, including pandas, Matplotlib, and scikit-learn, and master the process of data exploration and data processing, before moving on to fitting, evaluating, and tuning algorithms such as regularized logistic regression and random forest. Now in its second edition, this book will take you through the end-to-end process of exploring data and delivering machine learning models. Updated for 2021, this edition includes brand new content on XGBoost, SHAP values, algorithmic fairness, and the ethical concerns of deploying a model in the real world.By the end of this data science book, you’ll have the skills, understanding, and confidence to build your own machine learning models and gain insights from real data.
Stephen Klosterman
Data Science Projects with Python is designed to give you practical guidance on industry-standard data analysis and machine learning tools, by applying them to realistic data problems. You will learn how to use pandas and Matplotlib to critically examine datasets with summary statistics and graphs, and extract the insights you seek to derive. You will build your knowledge as you prepare data using the scikit-learn package and feed it to machine learning algorithms such as regularized logistic regression and random forest. You’ll discover how to tune algorithms to provide the most accurate predictions on new and unseen data. As you progress, you’ll gain insights into the working and output of these algorithms, building your understanding of both the predictive capabilities of the models and why they make these predictions.By then end of this book, you will have the necessary skills to confidently use machine learning algorithms to perform detailed data analysis and extract meaningful insights from unstructured data.
Matt Eland
As the fields of data science, machine learning, and artificial intelligence rapidly evolve, .NET developers are eager to leverage their expertise to dive into these exciting domains but are often unsure of how to do so. Data Science in .NET with Polyglot Notebooks is the practical guide you need to seamlessly bring your .NET skills into the world of analytics and AI. With Microsoft’s .NET platform now robustly supporting machine learning and AI tasks, the introduction of tools such as .NET Interactive kernels and Polyglot Notebooks has opened up a world of possibilities for .NET developers. This book empowers you to harness the full potential of these cutting-edge technologies, guiding you through hands-on experiments that illustrate key concepts and principles. Through a series of interactive notebooks, you’ll not only master technical processes but also discover how to integrate these new skills into your current role or pivot to exciting opportunities in the data science field. By the end of the book, you’ll have acquired the necessary knowledge and confidence to apply cutting-edge data science techniques and deliver impactful solutions within the .NET ecosystem.
Data Science with SQL Server Quick Start Guide. Integrate SQL Server with data science
Dejan Sarka
SQL Server only started to fully support data science with its two most recent editions. If you are a professional from both worlds, SQL Server and data science, and interested in using SQL Server and Machine Learning (ML) Services for your projects, then this is the ideal book for you.This book is the ideal introduction to data science with Microsoft SQL Server and In-Database ML Services. It covers all stages of a data science project, from businessand data understanding,through data overview, data preparation, modeling and using algorithms, model evaluation, and deployment.You will learn to use the engines and languages that come with SQL Server, including ML Services with R and Python languages and Transact-SQL. You will also learn how to choose which algorithm to use for which task, and learn the working of each algorithm.
Data Stewardship in Action. A roadmap to data value realization and measurable business outcomes
Pui Shing Lee, Dr. Toa Charm
In the competitive data-centric world, mastering data stewardship is not just a requirement—it's the key to organizational success. Unlock strategic excellence with Data Stewardship in Action, your guide to exploring the intricacies of data stewardship and its implementation for maximum efficiency.From business strategy to data strategy, and then to data stewardship, this book shows you how to strategically deploy your workforce, processes, and technology for efficient data processing. You’ll gain mastery over the fundamentals of data stewardship, from understanding the different roles and responsibilities to implementing best practices for data governance. You’ll elevate your data management skills by exploring the technologies and tools for effective data handling. As you progress through the chapters, you’ll realize that this book not only helps you develop the foundational skills to become a successful data steward but also introduces innovative approaches, including leveraging AI and GPT, for enhanced data stewardship.By the end of this book, you’ll be able to build a robust data governance framework by developing policies and procedures, establishing a dedicated data governance team, and creating a data governance roadmap that ensures your organization thrives in the dynamic landscape of data management.
Sireesha Pulipati
Presenting data visually makes it easier for organizations and individuals to interpret and analyze information. Looker Studio is an easy-to-use, collaborative tool that enables you to transform your data into engaging visualizations. This allows you to build and share dashboards that help monitor key performance indicators, identify patterns, and generate insights to ultimately drive decisions and actions.Data Storytelling with Looker Studio begins by laying out the foundational design principles and guidelines that are essential to creating accurate, effective, and compelling data visualizations. Next, you’ll delve into features and capabilities of Looker Studio – from basic to advanced – and explore their application with examples. The subsequent chapters walk you through building dashboards with a structured three-stage process called the 3D approach using real-world examples that’ll help you understand the various design and implementation considerations. This approach involves determining the objectives and needs of the dashboard, designing its key components and layout, and developing each element of the dashboard.By the end of this book, you will have a solid understanding of the storytelling approach and be able to create data stories of your own using Looker Studio.
Data Structures and Algorithms with the C++ STL. A guide for modern C++ practitioners
John Farrier
While the Standard Template Library (STL) offers a rich set of tools for data structures and algorithms, navigating its intricacies can be daunting for intermediate C++ developers without expert guidance. This book offers a thorough exploration of the STL’s components, covering fundamental data structures, advanced algorithms, and concurrency features.Starting with an in-depth analysis of the std::vector, this book highlights its pivotal role in the STL, progressing toward building your proficiency in utilizing vectors, managing memory, and leveraging iterators. The book then advances to STL’s data structures, including sequence containers, associative containers, and unordered containers, simplifying the concepts of container adaptors and views to enhance your knowledge of modern STL programming. Shifting the focus to STL algorithms, you’ll get to grips with sorting, searching, and transformations and develop the skills to implement and modify algorithms with best practices. Advanced sections cover extending the STL with custom types and algorithms, as well as concurrency features, exception safety, and parallel algorithms.By the end of this book, you’ll have transformed into a proficient STL practitioner ready to tackle real-world challenges and build efficient and scalable C++ applications.
Data Visualization: a successful design process
Andy Kirk
Do you want to create more attractive charts? Or do you have huge data sets and need to unearth the key insights in a visual manner? Data visualization is the representation and presentation of data, using proven design techniques to bring alive the patterns, stories and key insights locked away.Data Visualization: a Successful Design Process explores the unique fusion of art and science that is data visualization; a discipline for which instinct alone is insufficient for you to succeed in enabling audiences to discover key trends, insights and discoveries from your data. This book will equip you with the key techniques required to overcome contemporary data visualization challenges. You'll discover a proven design methodology that helps you develop invaluable knowledge and practical capabilities.You'll never again settle for a default Excel chart or resort to fancy-looking graphs. You will be able to work from the starting point of acquiring, preparing and familiarizing with your data, right through to concept design. Choose your killer visual representation to engage and inform your audience.Data Visualization: a Successful Design Process will inspire you to relish any visualization project with greater confidence and bullish know-how; turning challenges into exciting design opportunities.
Aendrew Rininsland , Andy Kirk, Swizec Teller,...
Do you want to create more attractive charts? Or do you have huge data sets and need to unearth the key insights in a visual manner? Data visualization is the representation and presentation of data, using proven design techniques to bring alive the patterns, stories, and key insights that are locked away.This learning path is divided into three modules. The first module will equip you with the key techniques required to overcome contemporary data visualization challenges. In the second module, Social Data Visualization with HTML5 and JavaScript, it teaches you how to leverage HTML5 techniques through JavaScript to build visualizations.In third module, Learning d3.js Data Visualization, will lead you to D3, which has emerged as one of the leading platforms to develop beautiful, interactive visualizations over the web. By the end of this course, you will have unlocked the mystery behind successful data visualizations.This Learning Path combines some of the best that Packt has to offer in one complete, curated package. It includes content from the following Packt products:? Data Visualization: a successful design process by Andy Kirk? Social Data Visualization with HTML5 and JavaScript by Simon Timms? Learning d3.js Data Visualization, Second Edition by Ændrew Rininsland and Swizec Teller
Nick Zhu
Master D3.js and create amazing visualizations with the Data Visualization with D3 4.x Cookbook. Written by professional data engineer Nick Zhu, this D3.js cookbook features over 65 recipes. ? Solve real-world visualization problems using D3.js practical recipes ? Understand D3 fundamentals ? Includes illustrations, ready-to-go code samples and pre-built chart recipes
Nick Zhu
D3.js is a JavaScript library designed to display digital data in dynamic graphical form. It helps you bring data to life using HTML, SVG, and CSS. D3 allows great control over the final visual result, and it is the hottest and most powerful web-based data visualization technology on the market today.Data Visualization with D3.js Cookbook is packed with practical recipes to help you learn every aspect of data visualization with D3.Data Visualization with D3.js Cookbook is designed to provide you with all the guidance you need to get to grips with data visualization with D3. With this book, you will create breathtaking data visualization with professional efficiency and precision with the help of practical recipes, illustrations, and code samples.Data Visualization with D3.js Cookbook starts off by touching upon data visualization and D3 basics before gradually taking you through a number of practical recipes covering a wide range of topics you need to know about D3.You will learn the fundamental concepts of data visualization, functional JavaScript, and D3 fundamentals including element selection, data binding, animation, and SVG generation. You will also learn how to leverage more advanced techniques such as custom interpolators, custom tweening, timers, the layout manager, force manipulation, and so on. This book also provides a number of pre-built chart recipes with ready-to-go sample code to help you bootstrap quickly.
Data Wrangling on AWS. Clean and organize complex data for analysis
Navnit Shukla, Sankar M, Sam Palani
Data wrangling is the process of cleaning, transforming, and organizing raw, messy, or unstructured data into a structured format. It involves processes such as data cleaning, data integration, data transformation, and data enrichment to ensure that the data is accurate, consistent, and suitable for analysis. Data Wrangling on AWS equips you with the knowledge to reap the full potential of AWS data wrangling tools.First, you’ll be introduced to data wrangling on AWS and will be familiarized with data wrangling services available in AWS. You’ll understand how to work with AWS Glue DataBrew, AWS data wrangler, and AWS Sagemaker. Next, you’ll discover other AWS services like Amazon S3, Redshift, Athena, and Quicksight. Additionally, you’ll explore advanced topics such as performing Pandas data operation with AWS data wrangler, optimizing ML data with AWS SageMaker, building the data warehouse with Glue DataBrew, along with security and monitoring aspects.By the end of this book, you’ll be well-equipped to perform data wrangling using AWS services.
Data Wrangling with Python. Creating actionable data from raw sources
Dr. Tirthajyoti Sarkar , Shubhadeep Roychowdhury
For data to be useful and meaningful, it must be curated and refined. Data Wrangling with Python teaches you the core ideas behind these processes and equips you with knowledge of the most popular tools and techniques in the domain.The book starts with the absolute basics of Python, focusing mainly on data structures. It then delves into the fundamental tools of data wrangling like NumPy and Pandas libraries. You'll explore useful insights into why you should stay away from traditional ways of data cleaning, as done in other languages, and take advantage of the specialized pre-built routines in Python. This combination of Python tips and tricks will also demonstrate how to use the same Python backend and extract/transform data from an array of sources including the Internet, large database vaults, and Excel financial tables. To help you prepare for more challenging scenarios, you'll cover how to handle missing or wrong data, and reformat it based on the requirements from the downstream analytics tool. The book will further help you grasp concepts through real-world examples and datasets.By the end of this book, you will be confident in using a diverse array of sources to extract, clean, transform, and format your data efficiently.
Gustavo Santos
In this information era, where large volumes of data are being generated every day, companies want to get a better grip on it to perform more efficiently than before. This is where skillful data analysts and data scientists come into play, wrangling and exploring data to generate valuable business insights. In order to do that, you’ll need plenty of tools that enable you to extract the most useful knowledge from data.Data Wrangling with R will help you to gain a deep understanding of ways to wrangle and prepare datasets for exploration, analysis, and modeling. This data book enables you to get your data ready for more optimized analyses, develop your first data model, and perform effective data visualization.The book begins by teaching you how to load and explore datasets. Then, you’ll get to grips with the modern concepts and tools of data wrangling. As data wrangling and visualization are intrinsically connected, you’ll go over best practices to plot data and extract insights from it. The chapters are designed in a way to help you learn all about modeling, as you will go through the construction of a data science project from end to end, and become familiar with the built-in RStudio, including an application built with Shiny dashboards.By the end of this book, you’ll have learned how to create your first data model and build an application with Shiny in R.
Data Wrangling with SQL. A hands-on guide to manipulating, wrangling, and engineering data using SQL
Raghav Kandarpa, Shivangi Saxena
The amount of data generated continues to grow rapidly, making it increasingly important for businesses to be able to wrangle this data and understand it quickly and efficiently. Although data wrangling can be challenging, with the right tools and techniques you can efficiently handle enormous amounts of unstructured data.The book starts by introducing you to the basics of SQL, focusing on the core principles and techniques of data wrangling. You’ll then explore advanced SQL concepts like aggregate functions, window functions, CTEs, and subqueries that are very popular in the business world. The next set of chapters will walk you through different functions within SQL query that cause delays in data transformation and help you figure out the difference between a good query and bad one. You’ll also learn how data wrangling and data science go hand in hand. The book is filled with datasets and practical examples to help you understand the concepts thoroughly, along with best practices to guide you at every stage of data wrangling.By the end of this book, you’ll be equipped with essential techniques and best practices for data wrangling, and will predominantly learn how to use clean and standardized data models to make informed decisions, helping businesses avoid costly mistakes.