Publisher: K-i-s-publishing
Kirk Munroe
Tableau is unlike most other BI platforms that have a single data modeling tool and enterprise data model (for example, LookML from Google’s Looker). That doesn’t mean Tableau doesn’t have enterprise data governance; it is both robust and highly flexible. This book will help you effectively use Tableau governance models to build a data-driven organization.Data Modeling with Tableau is an extensive guide, complete with step-by-step explanations of essential concepts, practical examples, and hands-on exercises. As you progress through the chapters, you’ll learn the role that Tableau Prep Builder and Tableau Desktop each play in data modeling. You’ll also explore the components of Tableau Server and Tableau Cloud that make data modeling more robust, secure, and performant. Moreover, by extending data models for Ask and Explain Data, you’ll gain the knowledge required to extend analytics to more people in their organizations, leading to better data-driven decisions. Finally, this book will guide you through the entire Tableau stack and the techniques required to build the right level of governance into Tableau data models for the correct use cases.By the end of this Tableau book, you’ll have a firm understanding of how to leverage data modeling in Tableau to benefit your organization.
Michele Pinto, Sammy El Khammal
In the age of information, strategic management of data is critical to organizational success. The constant challenge lies in maintaining data accuracy and preventing data pipelines from breaking. Data Observability for Data Engineering is your definitive guide to implementing data observability successfully in your organization.This book unveils the power of data observability, a fusion of techniques and methods that allow you to monitor and validate the health of your data. You’ll see how it builds on data quality monitoring and understand its significance from the data engineering perspective. Once you're familiar with the techniques and elements of data observability, you'll get hands-on with a practical Python project to reinforce what you've learned. Toward the end of the book, you’ll apply your expertise to explore diverse use cases and experiment with projects to seamlessly implement data observability in your organization.Equipped with the mastery of data observability intricacies, you’ll be able to make your organization future-ready and resilient and never worry about the quality of your data pipelines again.
Dr. Argenis Leon, Luis Aguirre
Optimus is a Python library that works as a unified API for data cleaning, processing, and merging data. It can be used for handling small and big data on your local laptop or on remote clusters using CPUs or GPUs.The book begins by covering the internals of Optimus and how it works in tandem with the existing technologies to serve your data processing needs. You'll then learn how to use Optimus for loading and saving data from text data formats such as CSV and JSON files, exploring binary files such as Excel, and for columnar data processing with Parquet, Avro, and OCR. Next, you'll get to grips with the profiler and its data types - a unique feature of Optimus Dataframe that assists with data quality. You'll see how to use the plots available in Optimus such as histogram, frequency charts, and scatter and box plots, and understand how Optimus lets you connect to libraries such as Plotly and Altair. You'll also delve into advanced applications such as feature engineering, machine learning, cross-validation, and natural language processing functions and explore the advancements in Optimus. Finally, you'll learn how to create data cleaning and transformation functions and add a hypothetical new data processing engine with Optimus.By the end of this book, you'll be able to improve your data science workflow with Optimus easily.
Data Quality in the Age of AI. Building a foundation for AI strategy and data culture
Andrew Jones
As organizations worldwide seek to revamp their data strategies to leverage AI advancements and benefit from newfound capabilities, data quality emerges as the cornerstone for success. Without high-quality data, even the most advanced AI models falter. Enter Data Quality in the Age of AI, a detailed report that illuminates the crucial role of data quality in shaping effective data strategies.Packed with actionable insights, this report highlights the critical role of data quality in your overall data strategy. It equips teams and organizations with the knowledge and tools to thrive in the evolving AI landscape, serving as a roadmap for harnessing the power of data quality, enabling them to unlock their data's full potential, leading to improved performance, reduced costs, increased revenue, and informed strategic decisions.
Rohan Chopra , Aaron England, Mohamed Noordeen...
Data Science with Python begins by introducing you to data science and teaches you to install the packages you need to create a data science coding environment. You will learn three major techniques in machine learning: unsupervised learning, supervised learning, and reinforcement learning. You will also explore basic classification and regression techniques, such as support vector machines, decision trees, and logistic regression.As you make your way through the book, you will understand the basic functions, data structures, and syntax of the Python language that are used to handle large datasets with ease. You will learn about NumPy and pandas libraries for matrix calculations and data manipulation, discover how to use Matplotlib to create highly customizable visualizations, and apply the boosting algorithm XGBoost to make predictions. In the concluding chapters, you will explore convolutional neural networks (CNNs), deep learning algorithms used to predict what is in an image. You will also understand how to feed human sentences to a neural network, make the model process contextual information, and create human language processing systems to predict the outcome.By the end of this book, you will be able to understand and implement any new data science algorithm and have the confidence to experiment with tools or libraries other than those covered in the book.
David Natingga
Machine learning applications are highly automated and self-modifying, and continue to improve over time with minimal human intervention, as they learn from the trained data. To address the complex nature of various real-world data problems, specialized machine learning algorithms have been developed. Through algorithmic and statistical analysis, these models can be leveraged to gain new knowledge from existing data as well.Data Science Algorithms in a Week addresses all problems related to accurate and efficient data classification and prediction. Over the course of seven days, you will be introduced to seven algorithms, along with exercises that will help you understand different aspects of machine learning. You will see how to pre-cluster your data to optimize and classify it for large datasets. This book also guides you in predicting data based on existing trends in your dataset. This book covers algorithms such as k-nearest neighbors, Naive Bayes, decision trees, random forest, k-means, regression, and time-series analysis.By the end of this book, you will understand how to choose machine learning algorithms for clustering, classification, and regression and know which is best suited for your problem
Data Science for Decision Makers. Enhance your leadership skills with data science and AI expertise
Jon Howells
As data science and artificial intelligence (AI) become prevalent across industries, executives without formal education in statistics and machine learning, as well as data scientists moving into leadership roles, must learn how to make informed decisions about complex models and manage data teams. This book will elevate your leadership skills by guiding you through the core concepts of data science and AI.This comprehensive guide is designed to bridge the gap between business needs and technical solutions, empowering you to make informed decisions and drive measurable value within your organization. Through practical examples and clear explanations, you'll learn how to collect and analyze structured and unstructured data, build a strong foundation in statistics and machine learning, and evaluate models confidently. By recognizing common pitfalls and valuable use cases, you'll plan data science projects effectively, from the ground up to completion. Beyond technical aspects, this book provides tools to recruit top talent, manage high-performing teams, and stay up to date with industry advancements.By the end of this book, you’ll be able to characterize the data within your organization and frame business problems as data science problems.
Shane Molinari, Jim Packer
In today's world full of online threats, the complexity of harmful software presents a significant challenge for detection and analysis. This insightful guide will teach you how to apply the principles of data science to online security, acting as both an educational resource and a practical manual for everyday use.Data Science for Malware Analysis starts by explaining the nuances of malware, from its lifecycle to its technological aspects before introducing you to the capabilities of data science in malware detection by leveraging machine learning, statistical analytics, and social network analysis. As you progress through the chapters, you’ll explore the analytical methods of reverse engineering, machine language, dynamic scrutiny, and behavioral assessments of malicious software. You’ll also develop an understanding of the evolving cybersecurity compliance landscape with regulations such as GDPR and CCPA, and gain insights into the global efforts in curbing cyber threats.By the end of this book, you’ll have a firm grasp on the modern malware lifecycle and how you can employ data science within cybersecurity to ward off new and evolving threats.
Mirza Rahim Baig , Gururajan Govindan ,...
Unleash the power of data to reach your marketing goals with this practical guide to data science for business.This book will help you get started on your journey to becoming a master of marketing analytics with Python. You'll work with relevant datasets and build your practical skills by tackling engaging exercises and activities that simulate real-world market analysis projects.You'll learn to think like a data scientist, build your problem-solving skills, and discover how to look at data in new ways to deliver business insights and make intelligent data-driven decisions.As well as learning how to clean, explore, and visualize data, you'll implement machine learning algorithms and build models to make predictions. As you work through the book, you'll use Python tools to analyze sales, visualize advertising data, predict revenue, address customer churn, and implement customer segmentation to understand behavior.By the end of this book, you'll have the knowledge, skills, and confidence to implement data science and machine learning techniques to better understand your marketing data and improve your decision-making.
Tommy Blanchard, Debasish Behera, Pranshu Bhatnagar
Data Science for Marketing Analytics covers every stage of data analytics, from working with a raw dataset to segmenting a population and modeling different parts of the population based on the segments.The book starts by teaching you how to use Python libraries, such as pandas and Matplotlib, to read data from Python, manipulate it, and create plots, using both categorical and continuous variables. Then, you'll learn how to segment a population into groups and use different clustering techniques to evaluate customer segmentation. As you make your way through the chapters, you'll explore ways to evaluate and select the best segmentation approach, and go on to create a linear regression model on customer value data to predict lifetime value. In the concluding chapters, you'll gain an understanding of regression techniques and tools for evaluating regression models, and explore ways to predict customer choice using classification algorithms. Finally, you'll apply these techniques to create a churn model for modeling customer product choices.By the end of this book, you will be able to build your own marketing reporting and interactive dashboard solutions.
Gabriela Castillo Areco
Data is the new oil and Web3 is generating it at an unprecedented rate. Complete with practical examples, detailed explanations, and ideas for portfolio development, this comprehensive book serves as a step-by-step guide covering the industry best practices, tools, and resources needed to easily navigate the world of data in Web3.You’ll begin by acquiring a solid understanding of key blockchain concepts and the fundamental data science tools essential for Web3 projects. The subsequent chapters will help you explore the main data sources that can help address industry challenges, decode smart contracts, and build DeFi- and NFT-specific datasets. You’ll then tackle the complexities of feature engineering specific to blockchain data and familiarize yourself with diverse machine learning use cases that leverage Web3 data.The book includes interviews with industry leaders providing insights into their professional journeys to drive innovation in the Web 3 environment. Equipped with experience in handling crypto data, you’ll be able to demonstrate your skills in job interviews, academic pursuits, or when engaging potential clients.By the end of this book, you’ll have the essential tools to undertake end-to-end data science projects utilizing blockchain data, empowering you to help shape the next-generation internet.
Stephen Klosterman
If data is the new oil, then machine learning is the drill. As companies gain access to ever-increasing quantities of raw data, the ability to deliver state-of-the-art predictive models that support business decision-making becomes more and more valuable.In this book, you’ll work on an end-to-end project based around a realistic data set and split up into bite-sized practical exercises. This creates a case-study approach that simulates the working conditions you’ll experience in real-world data science projects.You’ll learn how to use key Python packages, including pandas, Matplotlib, and scikit-learn, and master the process of data exploration and data processing, before moving on to fitting, evaluating, and tuning algorithms such as regularized logistic regression and random forest. Now in its second edition, this book will take you through the end-to-end process of exploring data and delivering machine learning models. Updated for 2021, this edition includes brand new content on XGBoost, SHAP values, algorithmic fairness, and the ethical concerns of deploying a model in the real world.By the end of this data science book, you’ll have the skills, understanding, and confidence to build your own machine learning models and gain insights from real data.
Stephen Klosterman
Data Science Projects with Python is designed to give you practical guidance on industry-standard data analysis and machine learning tools, by applying them to realistic data problems. You will learn how to use pandas and Matplotlib to critically examine datasets with summary statistics and graphs, and extract the insights you seek to derive. You will build your knowledge as you prepare data using the scikit-learn package and feed it to machine learning algorithms such as regularized logistic regression and random forest. You’ll discover how to tune algorithms to provide the most accurate predictions on new and unseen data. As you progress, you’ll gain insights into the working and output of these algorithms, building your understanding of both the predictive capabilities of the models and why they make these predictions.By then end of this book, you will have the necessary skills to confidently use machine learning algorithms to perform detailed data analysis and extract meaningful insights from unstructured data.
Matt Eland
As the fields of data science, machine learning, and artificial intelligence rapidly evolve, .NET developers are eager to leverage their expertise to dive into these exciting domains but are often unsure of how to do so. Data Science in .NET with Polyglot Notebooks is the practical guide you need to seamlessly bring your .NET skills into the world of analytics and AI. With Microsoft’s .NET platform now robustly supporting machine learning and AI tasks, the introduction of tools such as .NET Interactive kernels and Polyglot Notebooks has opened up a world of possibilities for .NET developers. This book empowers you to harness the full potential of these cutting-edge technologies, guiding you through hands-on experiments that illustrate key concepts and principles. Through a series of interactive notebooks, you’ll not only master technical processes but also discover how to integrate these new skills into your current role or pivot to exciting opportunities in the data science field. By the end of the book, you’ll have acquired the necessary knowledge and confidence to apply cutting-edge data science techniques and deliver impactful solutions within the .NET ecosystem.
Data Science with SQL Server Quick Start Guide. Integrate SQL Server with data science
Dejan Sarka
SQL Server only started to fully support data science with its two most recent editions. If you are a professional from both worlds, SQL Server and data science, and interested in using SQL Server and Machine Learning (ML) Services for your projects, then this is the ideal book for you.This book is the ideal introduction to data science with Microsoft SQL Server and In-Database ML Services. It covers all stages of a data science project, from businessand data understanding,through data overview, data preparation, modeling and using algorithms, model evaluation, and deployment.You will learn to use the engines and languages that come with SQL Server, including ML Services with R and Python languages and Transact-SQL. You will also learn how to choose which algorithm to use for which task, and learn the working of each algorithm.
Data Stewardship in Action. A roadmap to data value realization and measurable business outcomes
Pui Shing Lee, Dr. Toa Charm
In the competitive data-centric world, mastering data stewardship is not just a requirement—it's the key to organizational success. Unlock strategic excellence with Data Stewardship in Action, your guide to exploring the intricacies of data stewardship and its implementation for maximum efficiency.From business strategy to data strategy, and then to data stewardship, this book shows you how to strategically deploy your workforce, processes, and technology for efficient data processing. You’ll gain mastery over the fundamentals of data stewardship, from understanding the different roles and responsibilities to implementing best practices for data governance. You’ll elevate your data management skills by exploring the technologies and tools for effective data handling. As you progress through the chapters, you’ll realize that this book not only helps you develop the foundational skills to become a successful data steward but also introduces innovative approaches, including leveraging AI and GPT, for enhanced data stewardship.By the end of this book, you’ll be able to build a robust data governance framework by developing policies and procedures, establishing a dedicated data governance team, and creating a data governance roadmap that ensures your organization thrives in the dynamic landscape of data management.
Sireesha Pulipati
Presenting data visually makes it easier for organizations and individuals to interpret and analyze information. Looker Studio is an easy-to-use, collaborative tool that enables you to transform your data into engaging visualizations. This allows you to build and share dashboards that help monitor key performance indicators, identify patterns, and generate insights to ultimately drive decisions and actions.Data Storytelling with Looker Studio begins by laying out the foundational design principles and guidelines that are essential to creating accurate, effective, and compelling data visualizations. Next, you’ll delve into features and capabilities of Looker Studio – from basic to advanced – and explore their application with examples. The subsequent chapters walk you through building dashboards with a structured three-stage process called the 3D approach using real-world examples that’ll help you understand the various design and implementation considerations. This approach involves determining the objectives and needs of the dashboard, designing its key components and layout, and developing each element of the dashboard.By the end of this book, you will have a solid understanding of the storytelling approach and be able to create data stories of your own using Looker Studio.
Data Structures and Algorithms with the C++ STL. A guide for modern C++ practitioners
John Farrier
While the Standard Template Library (STL) offers a rich set of tools for data structures and algorithms, navigating its intricacies can be daunting for intermediate C++ developers without expert guidance. This book offers a thorough exploration of the STL’s components, covering fundamental data structures, advanced algorithms, and concurrency features.Starting with an in-depth analysis of the std::vector, this book highlights its pivotal role in the STL, progressing toward building your proficiency in utilizing vectors, managing memory, and leveraging iterators. The book then advances to STL’s data structures, including sequence containers, associative containers, and unordered containers, simplifying the concepts of container adaptors and views to enhance your knowledge of modern STL programming. Shifting the focus to STL algorithms, you’ll get to grips with sorting, searching, and transformations and develop the skills to implement and modify algorithms with best practices. Advanced sections cover extending the STL with custom types and algorithms, as well as concurrency features, exception safety, and parallel algorithms.By the end of this book, you’ll have transformed into a proficient STL practitioner ready to tackle real-world challenges and build efficient and scalable C++ applications.
Data Visualization: a successful design process
Andy Kirk
Do you want to create more attractive charts? Or do you have huge data sets and need to unearth the key insights in a visual manner? Data visualization is the representation and presentation of data, using proven design techniques to bring alive the patterns, stories and key insights locked away.Data Visualization: a Successful Design Process explores the unique fusion of art and science that is data visualization; a discipline for which instinct alone is insufficient for you to succeed in enabling audiences to discover key trends, insights and discoveries from your data. This book will equip you with the key techniques required to overcome contemporary data visualization challenges. You'll discover a proven design methodology that helps you develop invaluable knowledge and practical capabilities.You'll never again settle for a default Excel chart or resort to fancy-looking graphs. You will be able to work from the starting point of acquiring, preparing and familiarizing with your data, right through to concept design. Choose your killer visual representation to engage and inform your audience.Data Visualization: a Successful Design Process will inspire you to relish any visualization project with greater confidence and bullish know-how; turning challenges into exciting design opportunities.
Aendrew Rininsland , Andy Kirk, Swizec Teller,...
Do you want to create more attractive charts? Or do you have huge data sets and need to unearth the key insights in a visual manner? Data visualization is the representation and presentation of data, using proven design techniques to bring alive the patterns, stories, and key insights that are locked away.This learning path is divided into three modules. The first module will equip you with the key techniques required to overcome contemporary data visualization challenges. In the second module, Social Data Visualization with HTML5 and JavaScript, it teaches you how to leverage HTML5 techniques through JavaScript to build visualizations.In third module, Learning d3.js Data Visualization, will lead you to D3, which has emerged as one of the leading platforms to develop beautiful, interactive visualizations over the web. By the end of this course, you will have unlocked the mystery behind successful data visualizations.This Learning Path combines some of the best that Packt has to offer in one complete, curated package. It includes content from the following Packt products:? Data Visualization: a successful design process by Andy Kirk? Social Data Visualization with HTML5 and JavaScript by Simon Timms? Learning d3.js Data Visualization, Second Edition by Ændrew Rininsland and Swizec Teller
Nick Zhu
Master D3.js and create amazing visualizations with the Data Visualization with D3 4.x Cookbook. Written by professional data engineer Nick Zhu, this D3.js cookbook features over 65 recipes. ? Solve real-world visualization problems using D3.js practical recipes ? Understand D3 fundamentals ? Includes illustrations, ready-to-go code samples and pre-built chart recipes
Nick Zhu
D3.js is a JavaScript library designed to display digital data in dynamic graphical form. It helps you bring data to life using HTML, SVG, and CSS. D3 allows great control over the final visual result, and it is the hottest and most powerful web-based data visualization technology on the market today.Data Visualization with D3.js Cookbook is packed with practical recipes to help you learn every aspect of data visualization with D3.Data Visualization with D3.js Cookbook is designed to provide you with all the guidance you need to get to grips with data visualization with D3. With this book, you will create breathtaking data visualization with professional efficiency and precision with the help of practical recipes, illustrations, and code samples.Data Visualization with D3.js Cookbook starts off by touching upon data visualization and D3 basics before gradually taking you through a number of practical recipes covering a wide range of topics you need to know about D3.You will learn the fundamental concepts of data visualization, functional JavaScript, and D3 fundamentals including element selection, data binding, animation, and SVG generation. You will also learn how to leverage more advanced techniques such as custom interpolators, custom tweening, timers, the layout manager, force manipulation, and so on. This book also provides a number of pre-built chart recipes with ready-to-go sample code to help you bootstrap quickly.
Data Wrangling on AWS. Clean and organize complex data for analysis
Navnit Shukla, Sankar M, Sam Palani
Data wrangling is the process of cleaning, transforming, and organizing raw, messy, or unstructured data into a structured format. It involves processes such as data cleaning, data integration, data transformation, and data enrichment to ensure that the data is accurate, consistent, and suitable for analysis. Data Wrangling on AWS equips you with the knowledge to reap the full potential of AWS data wrangling tools.First, you’ll be introduced to data wrangling on AWS and will be familiarized with data wrangling services available in AWS. You’ll understand how to work with AWS Glue DataBrew, AWS data wrangler, and AWS Sagemaker. Next, you’ll discover other AWS services like Amazon S3, Redshift, Athena, and Quicksight. Additionally, you’ll explore advanced topics such as performing Pandas data operation with AWS data wrangler, optimizing ML data with AWS SageMaker, building the data warehouse with Glue DataBrew, along with security and monitoring aspects.By the end of this book, you’ll be well-equipped to perform data wrangling using AWS services.