Wydawca: Packt Publishing

Founded in 2004 in Birmingham, UK, Packt's mission is to help the world put software to work in new ways, through the delivery of effective learning and information services to IT professionals. Working towards that vision, we have published over 6,500 books and videos so far, providing IT professionals with the actionable knowledge they need to get the job done - whether that's specific learning on an emerging technology or optimizing key skills in more established tools. As part of our mission, we have also awarded over $1,000,000 through our Open Source Project Royalty scheme, helping numerous projects become household names along the way.
1201
Ładowanie...
EBOOK

Data Ingestion with Python Cookbook. A practical guide to ingesting, monitoring, and identifying errors in the data ingestion process

Gláucia Esppenchutz

Data Ingestion with Python Cookbook offers a practical approach to designing and implementing data ingestion pipelines. It presents real-world examples with the most widely recognized open source tools on the market to answer commonly asked questions and overcome challenges.You’ll be introduced to designing and working with or without data schemas, as well as creating monitored pipelines with Airflow and data observability principles, all while following industry best practices. The book also addresses challenges associated with reading different data sources and data formats. As you progress through the book, you’ll gain a broader understanding of error logging best practices, troubleshooting techniques, data orchestration, monitoring, and storing logs for further consultation.By the end of the book, you’ll have a fully automated set that enables you to start ingesting and monitoring your data pipeline effortlessly, facilitating seamless integration with subsequent stages of the ETL process.

1202
Ładowanie...
EBOOK

Data Labeling in Machine Learning with Python. Explore modern ways to prepare labeled data for training and fine-tuning ML and generative AI models

Vijaya Kumar Suda

Data labeling is the invisible hand that guides the power of artificial intelligence and machine learning. In today’s data-driven world, mastering data labeling is not just an advantage, it’s a necessity. Data Labeling in Machine Learning with Python empowers you to unearth value from raw data, create intelligent systems, and influence the course of technological evolution.With this book, you'll discover the art of employing summary statistics, weak supervision, programmatic rules, and heuristics to assign labels to unlabeled training data programmatically. As you progress, you'll be able to enhance your datasets by mastering the intricacies of semi-supervised learning and data augmentation. Venturing further into the data landscape, you'll immerse yourself in the annotation of image, video, and audio data, harnessing the power of Python libraries such as seaborn, matplotlib, cv2, librosa, openai, and langchain. With hands-on guidance and practical examples, you'll gain proficiency in annotating diverse data types effectively.By the end of this book, you’ll have the practical expertise to programmatically label diverse data types and enhance datasets, unlocking the full potential of your data.

1203
Ładowanie...
EBOOK

Data Lake Development with Big Data. Explore architectural approaches to building Data Lakes that ingest, index, manage, and analyze massive amounts of data using Big Data technologies

Pradeep Pasupuleti, Beulah Salome Purra

A Data Lake is a highly scalable platform for storing huge volumes of multistructured data from disparate sources with centralized data management services. This book explores the potential of Data Lakes and explores architectural approaches to building data lakes that ingest, index, manage, and analyze massive amounts of data using batch and real-time processing frameworks. It guides you on how to go about building a Data Lake that is managed by Hadoop and accessed as required by other Big Data applications.This book will guide readers (using best practices) in developing Data Lake's capabilities. It will focus on architect data governance, security, data quality, data lineage tracking, metadata management, and semantic data tagging. By the end of this book, you will have a good understanding of building a Data Lake for Big Data.

1204
Ładowanie...
EBOOK

Data Lake for Enterprises. Lambda Architecture for building enterprise data systems

Vivek Mishra, Tomcy John, Pankaj Misra

The term Data Lake has recently emerged as a prominent term in the big data industry. Data scientists can make use of it in deriving meaningful insights that can be used by businesses to redefine or transform the way they operate. Lambda architecture is also emerging as one of the very eminent patterns in the big data landscape, as it not only helps to derive useful information from historical data but also correlates real-time data to enable business to take critical decisions. This book tries to bring these two important aspects — data lake and lambda architecture—together.This book is divided into three main sections. The first introduces you to the concept of data lakes, the importance of data lakes in enterprises, and getting you up-to-speed with the Lambda architecture. The second section delves into the principal components of building a data lake using the Lambda architecture. It introduces you to popular big data technologies such as Apache Hadoop, Spark, Sqoop, Flume, and ElasticSearch. The third section is a highly practical demonstration of putting it all together, and shows you how an enterprise data lake can be implemented, along with several real-world use-cases. It also shows you how other peripheral components can be added to the lake to make it more efficient.By the end of this book, you will be able to choose the right big data technologies using the lambda architectural patterns to build your enterprise data lake.

1205
Ładowanie...
EBOOK

Data Lakehouse in Action. Architecting a modern and scalable data analytics platform

Pradeep Menon

The Data Lakehouse architecture is a new paradigm that enables large-scale analytics. This book will guide you in developing data architecture in the right way to ensure your organization's success.The first part of the book discusses the different data architectural patterns used in the past and the need for a new architectural paradigm, as well as the drivers that have caused this change. It covers the principles that govern the target architecture, the components that form the Data Lakehouse architecture, and the rationale and need for those components. The second part deep dives into the different layers of Data Lakehouse. It covers various scenarios and components for data ingestion, storage, data processing, data serving, analytics, governance, and data security. The book's third part focuses on the practical implementation of the Data Lakehouse architecture in a cloud computing platform. It focuses on various ways to combine the Data Lakehouse pattern to realize macro-patterns, such as Data Mesh and Data Hub-Spoke, based on the organization's needs and maturity level. The frameworks introduced will be practical and organizations can readily benefit from their application.By the end of this book, you'll clearly understand how to implement the Data Lakehouse architecture pattern in a scalable, agile, and cost-effective manner.

1206
Ładowanie...
EBOOK

Data Literacy in Practice. A complete guide to data literacy and making smarter decisions with data through intelligent actions

Angelika Klidas, Kevin Hanegan

Data is more than a mere commodity in our digital world. It is the ebb and flow of our modern existence. Individuals, teams, and enterprises working with data can unlock a new realm of possibilities. And the resultant agility, growth, and inevitable success have one origin—data literacy.This comprehensive guide is written by two data literacy pioneers, each with a thorough footprint within the data and analytics commercial world and lectures at top universities in the US and the Netherlands. Complete with best practices, practical models, and real-world examples, Data Literacy in Practice will help you start making your data work for you by building your understanding of data literacy basics and accelerating your journey to independently uncovering insights.You’ll learn the four-pillar model that underpins all data and analytics and explore concepts such as measuring data quality, setting up a pragmatic data management environment, choosing the right graphs for your readers, and questioning your insights.By the end of the book, you'll be equipped with a combination of skills and mindset as well as with tools and frameworks that will allow you to find insights and meaning within your data for data-informed decision making.

1207
Ładowanie...
AUDIOBOOK

Data Literacy in Practice Audiobook. A complete guide to data literacy and making smarter decisions with data through intelligent actions

Angelika Klidas, Kevin Hanegan

Data is more than a mere commodity in our digital world. It is the ebb and flow of our modern existence. Individuals, teams, and enterprises working with data can unlock a new realm of possibilities. And the resultant agility, growth, and inevitable success have one origin—data literacy.This comprehensive guide is written by two data literacy pioneers, each with a thorough footprint within the data and analytics commercial world and lectures at top universities in the US and the Netherlands. Complete with best practices, practical models, and real-world examples, Data Literacy in Practice will help you start making your data work for you by building your understanding of data literacy basics and accelerating your journey to independently uncovering insights.You’ll learn the four-pillar model that underpins all data and analytics and explore concepts such as measuring data quality, setting up a pragmatic data management environment, choosing the right graphs for your readers, and questioning your insights.By the end of the book, you'll be equipped with a combination of skills and mindset as well as with tools and frameworks that will allow you to find insights and meaning within your data for data-informed decision making.

1208
Ładowanie...
EBOOK

Data Management Strategy at Microsoft. Best practices from a tech giant's decade-long data transformation journey

Aleksejs Plotnikovs

Microsoft pioneered data innovation and investment ahead of many in the industry, setting a remarkable standard for data maturity. Written by a data leader with over 15 years of experience following Microsoft’s data journey, this book delves into every crucial aspect of this journey, including change management, aligning with business needs, enhancing data value, and cultivating a data-driven culture.This book emphasizes that success in a data-driven enterprise goes beyond relying solely on modern technology and highlights the importance of prioritizing genuine business needs to propel necessary modernizations through change management practices. You’ll see how data-driven innovation does not solely reside within central IT engineering teams but also among the data's business owners who rely on data daily for their operational needs. This guide empower these professionals with clean, easily discoverable, and business-ready data, marking a significant breakthrough in how data is perceived and utilized throughout an enterprise. You’ll also discover advanced techniques to nurture the value of data as unique intellectual property, and differentiate your organization with the power of data.Its storytelling approach and summary of essential insights at the end of each chapter make this book invaluable for business and data leaders to advocate for crucial data investments.

1209
Ładowanie...
EBOOK

Data Modeling for Azure Data Services. Implement professional data design and structures in Azure

Peter ter Braake

Data is at the heart of all applications and forms the foundation of modern data-driven businesses. With the multitude of data-related use cases and the availability of different data services, choosing the right service and implementing the right design becomes paramount to successful implementation.Data Modeling for Azure Data Services starts with an introduction to databases, entity analysis, and normalizing data. The book then shows you how to design a NoSQL database for optimal performance and scalability and covers how to provision and implement Azure SQL DB, Azure Cosmos DB, and Azure Synapse SQL Pool. As you progress through the chapters, you'll learn about data analytics, Azure Data Lake, and Azure SQL Data Warehouse and explore dimensional modeling, data vault modeling, along with designing and implementing a Data Lake using Azure Storage. You'll also learn how to implement ETL with Azure Data Factory.By the end of this book, you'll have a solid understanding of which Azure data services are the best fit for your model and how to implement the best design for your solution.

1210
Ładowanie...
EBOOK

Data Modeling with Microsoft Excel. Model and analyze data using Power Pivot, DAX, and Cube functions

Bernard Obeng Boateng, Michael Olafusi

Microsoft Excel's BI solutions have evolved, offering users more flexibility and control over analyzing data directly in Excel. Features like PivotTables, Data Model, Power Query, and Power Pivot empower Excel users to efficiently get, transform, model, aggregate, and visualize data.Data Modeling with Microsoft Excel offers a practical way to demystify the use and application of these tools using real-world examples and simple illustrations.This book will introduce you to the world of data modeling in Excel, as well as definitions and best practices in data structuring for both normalized and denormalized data. The next set of chapters will take you through the useful features of Data Model and Power Pivot, helping you get to grips with the types of schemas (snowflake and star) and create relationships within multiple tables. You’ll also understand how to create powerful and flexible measures using DAX and Cube functions.By the end of this book, you’ll be able to apply the acquired knowledge in real-world scenarios and build an interactive dashboard that will help you make important decisions.Note: To access the supplemental material, subscribers should purchase a print copy of the book. The ebook can be accessed through the QR code or link provided inside the Print book. Proof of purchase is mandatory to access the ebook.

1211
Ładowanie...
EBOOK

Data Modeling with Snowflake. A practical guide to accelerating Snowflake development using universal data modeling techniques

Serge Gershkovich, Kent Graziano

The Snowflake Data Cloud is one of the fastest-growing platforms for data warehousing and application workloads. Snowflake's scalable, cloud-native architecture and expansive set of features and objects enables you to deliver data solutions quicker than ever before.Yet, we must ensure that these solutions are developed using recommended design patterns and accompanied by documentation that’s easily accessible to everyone in the organization.This book will help you get familiar with simple and practical data modeling frameworks that accelerate agile design and evolve with the project from concept to code. These universal principles have helped guide database design for decades, and this book pairs them with unique Snowflake-native objects and examples like never before – giving you a two-for-one crash course in theory as well as direct application.By the end of this Snowflake book, you’ll have learned how to leverage Snowflake’s innovative features, such as time travel, zero-copy cloning, and change-data-capture, to create cost-effective, efficient designs through time-tested modeling principles that are easily digestible when coupled with real-world examples.

1212
Ładowanie...
EBOOK

Data Modeling with Snowflake. A practical guide to accelerating Snowflake development using universal modeling techniques - Second Edition

Serge Gershkovich, Joe Reis

Struggling with rising Snowflake costs and constant tuning? Poorly aligned data models can lead to bloated expenses, inefficient queries, and time-consuming rework. Data Modeling with Snowflake helps you harness the Snowflake Data Cloud’s scalable, cloud-native architecture and expansive feature set to deliver data solutions faster than ever.This book introduces simple, practical data modeling frameworks that accelerate agile design and evolve alongside your projects from concept to code. Rooted in decades of proven database design principles, these frameworks are paired, for the first time, with Snowflake-native objects and real-world examples, offering a two-in-one crash course in theory and direct application.Through real-world examples designed to make learning easy, you’ll leverage Snowflake’s innovative features like Time Travel, Zero-Copy Cloning, and Change Data Capture (CDC) to create cost-efficient solutions. Whether you're just starting out or refining your architecture, this book will guide you in designing smarter, scaling faster, and cutting costs by aligning timeless modeling principles with the power of Snowflake.*Email sign-up and proof of purchase required

1213
Ładowanie...
EBOOK

Data Modeling with Tableau. A practical guide to building data models using Tableau Prep and Tableau Desktop

Kirk Munroe

Tableau is unlike most other BI platforms that have a single data modeling tool and enterprise data model (for example, LookML from Google’s Looker). That doesn’t mean Tableau doesn’t have enterprise data governance; it is both robust and highly flexible. This book will help you effectively use Tableau governance models to build a data-driven organization.Data Modeling with Tableau is an extensive guide, complete with step-by-step explanations of essential concepts, practical examples, and hands-on exercises. As you progress through the chapters, you’ll learn the role that Tableau Prep Builder and Tableau Desktop each play in data modeling. You’ll also explore the components of Tableau Server and Tableau Cloud that make data modeling more robust, secure, and performant. Moreover, by extending data models for Ask and Explain Data, you’ll gain the knowledge required to extend analytics to more people in their organizations, leading to better data-driven decisions. Finally, this book will guide you through the entire Tableau stack and the techniques required to build the right level of governance into Tableau data models for the correct use cases.By the end of this Tableau book, you’ll have a firm understanding of how to leverage data modeling in Tableau to benefit your organization.

1214
Ładowanie...
EBOOK

Data Observability for Data Engineering. Proactive strategies for ensuring data accuracy and addressing broken data pipelines

Michele Pinto, Sammy El Khammal

In the age of information, strategic management of data is critical to organizational success. The constant challenge lies in maintaining data accuracy and preventing data pipelines from breaking. Data Observability for Data Engineering is your definitive guide to implementing data observability successfully in your organization.This book unveils the power of data observability, a fusion of techniques and methods that allow you to monitor and validate the health of your data. You’ll see how it builds on data quality monitoring and understand its significance from the data engineering perspective. Once you're familiar with the techniques and elements of data observability, you'll get hands-on with a practical Python project to reinforce what you've learned. Toward the end of the book, you’ll apply your expertise to explore diverse use cases and experiment with projects to seamlessly implement data observability in your organization.Equipped with the mastery of data observability intricacies, you’ll be able to make your organization future-ready and resilient and never worry about the quality of your data pipelines again.

1215
Ładowanie...
EBOOK

Data Processing with Optimus. Supercharge big data preparation tasks for analytics and machine learning with Optimus using Dask and PySpark

Dr. Argenis Leon , Luis Aguirre Contreras

Optimus is a Python library that works as a unified API for data cleaning, processing, and merging data. It can be used for handling small and big data on your local laptop or on remote clusters using CPUs or GPUs.The book begins by covering the internals of Optimus and how it works in tandem with the existing technologies to serve your data processing needs. You'll then learn how to use Optimus for loading and saving data from text data formats such as CSV and JSON files, exploring binary files such as Excel, and for columnar data processing with Parquet, Avro, and OCR. Next, you'll get to grips with the profiler and its data types - a unique feature of Optimus Dataframe that assists with data quality. You'll see how to use the plots available in Optimus such as histogram, frequency charts, and scatter and box plots, and understand how Optimus lets you connect to libraries such as Plotly and Altair. You'll also delve into advanced applications such as feature engineering, machine learning, cross-validation, and natural language processing functions and explore the advancements in Optimus. Finally, you'll learn how to create data cleaning and transformation functions and add a hypothetical new data processing engine with Optimus.By the end of this book, you'll be able to improve your data science workflow with Optimus easily.

1216
Ładowanie...
EBOOK

Data Quality in the Age of AI. Building a foundation for AI strategy and data culture

Andrew Jones

As organizations worldwide seek to revamp their data strategies to leverage AI advancements and benefit from newfound capabilities, data quality emerges as the cornerstone for success. Without high-quality data, even the most advanced AI models falter. Enter Data Quality in the Age of AI, a detailed report that illuminates the crucial role of data quality in shaping effective data strategies.Packed with actionable insights, this report highlights the critical role of data quality in your overall data strategy. It equips teams and organizations with the knowledge and tools to thrive in the evolving AI landscape, serving as a roadmap for harnessing the power of data quality, enabling them to unlock their data's full potential, leading to improved performance, reduced costs, increased revenue, and informed strategic decisions.