Big data

169
Ładowanie...
EBOOK

Data Ingestion with Python Cookbook. A practical guide to ingesting, monitoring, and identifying errors in the data ingestion process

Gláucia Esppenchutz

Data Ingestion with Python Cookbook offers a practical approach to designing and implementing data ingestion pipelines. It presents real-world examples with the most widely recognized open source tools on the market to answer commonly asked questions and overcome challenges.You’ll be introduced to designing and working with or without data schemas, as well as creating monitored pipelines with Airflow and data observability principles, all while following industry best practices. The book also addresses challenges associated with reading different data sources and data formats. As you progress through the book, you’ll gain a broader understanding of error logging best practices, troubleshooting techniques, data orchestration, monitoring, and storing logs for further consultation.By the end of the book, you’ll have a fully automated set that enables you to start ingesting and monitoring your data pipeline effortlessly, facilitating seamless integration with subsequent stages of the ETL process.

170
Ładowanie...
EBOOK

Data Labeling in Machine Learning with Python. Explore modern ways to prepare labeled data for training and fine-tuning ML and generative AI models

Vijaya Kumar Suda

Data labeling is the invisible hand that guides the power of artificial intelligence and machine learning. In today’s data-driven world, mastering data labeling is not just an advantage, it’s a necessity. Data Labeling in Machine Learning with Python empowers you to unearth value from raw data, create intelligent systems, and influence the course of technological evolution.With this book, you'll discover the art of employing summary statistics, weak supervision, programmatic rules, and heuristics to assign labels to unlabeled training data programmatically. As you progress, you'll be able to enhance your datasets by mastering the intricacies of semi-supervised learning and data augmentation. Venturing further into the data landscape, you'll immerse yourself in the annotation of image, video, and audio data, harnessing the power of Python libraries such as seaborn, matplotlib, cv2, librosa, openai, and langchain. With hands-on guidance and practical examples, you'll gain proficiency in annotating diverse data types effectively.By the end of this book, you’ll have the practical expertise to programmatically label diverse data types and enhance datasets, unlocking the full potential of your data.

171
Ładowanie...
EBOOK

Data Lake for Enterprises. Lambda Architecture for building enterprise data systems

Vivek Mishra, Tomcy John, Pankaj Misra

The term Data Lake has recently emerged as a prominent term in the big data industry. Data scientists can make use of it in deriving meaningful insights that can be used by businesses to redefine or transform the way they operate. Lambda architecture is also emerging as one of the very eminent patterns in the big data landscape, as it not only helps to derive useful information from historical data but also correlates real-time data to enable business to take critical decisions. This book tries to bring these two important aspects — data lake and lambda architecture—together.This book is divided into three main sections. The first introduces you to the concept of data lakes, the importance of data lakes in enterprises, and getting you up-to-speed with the Lambda architecture. The second section delves into the principal components of building a data lake using the Lambda architecture. It introduces you to popular big data technologies such as Apache Hadoop, Spark, Sqoop, Flume, and ElasticSearch. The third section is a highly practical demonstration of putting it all together, and shows you how an enterprise data lake can be implemented, along with several real-world use-cases. It also shows you how other peripheral components can be added to the lake to make it more efficient.By the end of this book, you will be able to choose the right big data technologies using the lambda architectural patterns to build your enterprise data lake.

172
Ładowanie...
EBOOK

Data Literacy With Python. A Comprehensive Guide to Understanding and Analyzing Data with Python

Mercury Learning and Information, Oswald Campesato

This book ushers readers into the world of data, emphasizing its importance in modern industries and how its management leads to insightful decision-making. Using Python 3, the book introduces foundational data tasks and progresses to advanced model training concepts. Detailed, step-by-step Python examples help readers master training models, starting with the kNN algorithm and moving to other classifiers with minimal code adjustments. Tools like Sweetviz, Skimpy, Matplotlib, and Seaborn are introduced for hands-on chart and graph rendering.The course begins with working with data, detecting outliers and anomalies, and cleaning datasets. It then introduces statistics and progresses to using Matplotlib and Seaborn for data visualization. Each chapter builds on the previous one, ensuring a comprehensive understanding of data management and analysis.These concepts are crucial for making data-driven decisions. This book transitions readers from basic data handling to advanced model training, blending theoretical knowledge with practical skills. Companion files with source code and data sets enhance the learning experience, making this book an invaluable resource for mastering data science with Python.

173
Ładowanie...
EBOOK

Data Management Strategy at Microsoft. Best practices from a tech giant's decade-long data transformation journey

Aleksejs Plotnikovs

Microsoft pioneered data innovation and investment ahead of many in the industry, setting a remarkable standard for data maturity. Written by a data leader with over 15 years of experience following Microsoft’s data journey, this book delves into every crucial aspect of this journey, including change management, aligning with business needs, enhancing data value, and cultivating a data-driven culture.This book emphasizes that success in a data-driven enterprise goes beyond relying solely on modern technology and highlights the importance of prioritizing genuine business needs to propel necessary modernizations through change management practices. You’ll see how data-driven innovation does not solely reside within central IT engineering teams but also among the data's business owners who rely on data daily for their operational needs. This guide empower these professionals with clean, easily discoverable, and business-ready data, marking a significant breakthrough in how data is perceived and utilized throughout an enterprise. You’ll also discover advanced techniques to nurture the value of data as unique intellectual property, and differentiate your organization with the power of data.Its storytelling approach and summary of essential insights at the end of each chapter make this book invaluable for business and data leaders to advocate for crucial data investments.

174
Ładowanie...
EBOOK

Data Modeling with Microsoft Excel. Model and analyze data using Power Pivot, DAX, and Cube functions

Bernard Obeng Boateng, Michael Olafusi

Microsoft Excel's BI solutions have evolved, offering users more flexibility and control over analyzing data directly in Excel. Features like PivotTables, Data Model, Power Query, and Power Pivot empower Excel users to efficiently get, transform, model, aggregate, and visualize data.Data Modeling with Microsoft Excel offers a practical way to demystify the use and application of these tools using real-world examples and simple illustrations.This book will introduce you to the world of data modeling in Excel, as well as definitions and best practices in data structuring for both normalized and denormalized data. The next set of chapters will take you through the useful features of Data Model and Power Pivot, helping you get to grips with the types of schemas (snowflake and star) and create relationships within multiple tables. You’ll also understand how to create powerful and flexible measures using DAX and Cube functions.By the end of this book, you’ll be able to apply the acquired knowledge in real-world scenarios and build an interactive dashboard that will help you make important decisions.Note: To access the supplemental material, subscribers should purchase a print copy of the book. The ebook can be accessed through the QR code or link provided inside the Print book. Proof of purchase is mandatory to access the ebook.

175
Ładowanie...
EBOOK

Data Modeling with Snowflake. A practical guide to accelerating Snowflake development using universal data modeling techniques

Serge Gershkovich, Kent Graziano

The Snowflake Data Cloud is one of the fastest-growing platforms for data warehousing and application workloads. Snowflake's scalable, cloud-native architecture and expansive set of features and objects enables you to deliver data solutions quicker than ever before.Yet, we must ensure that these solutions are developed using recommended design patterns and accompanied by documentation that’s easily accessible to everyone in the organization.This book will help you get familiar with simple and practical data modeling frameworks that accelerate agile design and evolve with the project from concept to code. These universal principles have helped guide database design for decades, and this book pairs them with unique Snowflake-native objects and examples like never before – giving you a two-for-one crash course in theory as well as direct application.By the end of this Snowflake book, you’ll have learned how to leverage Snowflake’s innovative features, such as time travel, zero-copy cloning, and change-data-capture, to create cost-effective, efficient designs through time-tested modeling principles that are easily digestible when coupled with real-world examples.

176
Ładowanie...
EBOOK

Data Modeling with Snowflake. A practical guide to accelerating Snowflake development using universal modeling techniques - Second Edition

Serge Gershkovich, Joe Reis

Struggling with rising Snowflake costs and constant tuning? Poorly aligned data models can lead to bloated expenses, inefficient queries, and time-consuming rework. Data Modeling with Snowflake helps you harness the Snowflake Data Cloud’s scalable, cloud-native architecture and expansive feature set to deliver data solutions faster than ever.This book introduces simple, practical data modeling frameworks that accelerate agile design and evolve alongside your projects from concept to code. Rooted in decades of proven database design principles, these frameworks are paired, for the first time, with Snowflake-native objects and real-world examples, offering a two-in-one crash course in theory and direct application.Through real-world examples designed to make learning easy, you’ll leverage Snowflake’s innovative features like Time Travel, Zero-Copy Cloning, and Change Data Capture (CDC) to create cost-efficient solutions. Whether you're just starting out or refining your architecture, this book will guide you in designing smarter, scaling faster, and cutting costs by aligning timeless modeling principles with the power of Snowflake.*Email sign-up and proof of purchase required