Inne

33
Ebook

Data Engineering with Alteryx. Helping data engineers apply DataOps practices with Alteryx

Paul Houghton

Alteryx is a GUI-based development platform for data analytic applications.Data Engineering with Alteryx will help you leverage Alteryx’s code-free aspects which increase development speed while still enabling you to make the most of the code-based skills you have.This book will teach you the principles of DataOps and how they can be used with the Alteryx software stack. You’ll build data pipelines with Alteryx Designer and incorporate the error handling and data validation needed for reliable datasets. Next, you’ll take the data pipeline from raw data, transform it into a robust dataset, and publish it to Alteryx Server following a continuous integration process.By the end of this Alteryx book, you’ll be able to build systems for validating datasets, monitoring workflow performance, managing access, and promoting the use of your data sources.

34
Ebook

Data Engineering with AWS. Acquire the skills to design and build AWS-based data transformation pipelines like a pro - Second Edition

Gareth Eagar

This book, authored by a seasoned Senior Data Architect with 25 years of experience, aims to help you achieve proficiency in using the AWS ecosystem for data engineering. This revised edition provides updates in every chapter to cover the latest AWS services and features, takes a refreshed look at data governance, and includes a brand-new section on building modern data platforms which covers; implementing a data mesh approach, open-table formats (such as Apache Iceberg), and using DataOps for automation and observability.You'll begin by reviewing the key concepts and essential AWS tools in a data engineer's toolkit and getting acquainted with modern data management approaches. You'll then architect a data pipeline, review raw data sources, transform the data, and learn how that transformed data is used by various data consumers. You’ll learn how to ensure strong data governance, and about populating data marts and data warehouses along with how a data lakehouse fits into the picture. After that, you'll be introduced to AWS tools for analyzing data, including those for ad-hoc SQL queries and creating visualizations. Then, you'll explore how the power of machine learning and artificial intelligence can be used to draw new insights from data. In the final chapters, you'll discover transactional data lakes, data meshes, and how to build a cutting-edge data platform on AWS.By the end of this AWS book, you'll be able to execute data engineering tasks and implement a data pipeline on AWS like a pro!

35
Ebook

Data Engineering with Databricks Cookbook. Build effective data and AI solutions using Apache Spark, Databricks, and Delta Lake

Pulkit Chadha

Data Engineering with Databricks Cookbook will guide you through recipes to effectively use Apache Spark, Delta Lake, and Databricks for data engineering, beginning with an introduction to data ingestion and loading with Apache Spark.As you progress, you’ll be introduced to various data manipulation and data transformation solutions that can be applied to data. You'll find out how to manage and optimize Delta tables, as well as how to ingest and process streaming data. The book will also show you how to improve the performance problems of Apache Spark apps and Delta Lake. Later chapters will show you how to use Databricks to implement DataOps and DevOps practices and teach you how to orchestrate and schedule data pipelines using Databricks Workflows. Finally, you’ll understand how to set up and configure Unity Catalog for data governance.By the end of this book, you’ll be well-versed in building reliable and scalable data pipelines using modern data engineering technologies.

36
Ebook

Data Engineering with dbt. A practical guide to building a cloud-based, pragmatic, and dependable data platform with SQL

Roberto Zagni

dbt Cloud helps professional analytics engineers automate the application of powerful and proven patterns to transform data from ingestion to delivery, enabling real DataOps.This book begins by introducing you to dbt and its role in the data stack, along with how it uses simple SQL to build your data platform, helping you and your team work better together. You’ll find out how to leverage data modeling, data quality, master data management, and more to build a simple-to-understand and future-proof solution. As you advance, you’ll explore the modern data stack, understand how data-related careers are changing, and see how dbt enables this transition into the emerging role of an analytics engineer. The chapters help you build a sample project using the free version of dbt Cloud, Snowflake, and GitHub to create a professional DevOps setup with continuous integration, automated deployment, ELT run, scheduling, and monitoring, solving practical cases you encounter in your daily work.By the end of this dbt book, you’ll be able to build an end-to-end pragmatic data platform by ingesting data exported from your source systems, coding the needed transformations, including master data and the desired business rules, and building well-formed dimensional models or wide tables that’ll enable you to build reports with the BI tool of your choice.

37
Ebook

Data Engineering with Google Cloud Platform. A guide to leveling up as a data engineer by building a scalable data platform with Google Cloud - Second Edition

Adi Wijaya, António Vilares

The second edition of Data Engineering with Google Cloud builds upon the success of the first edition by offering enhanced clarity and depth to data professionals navigating the intricate landscape of data engineering.Beyond its foundational lessons, this new edition delves into the essential realm of data governance within Google Cloud, providing you invaluable insights into managing and optimizing data resources effectively. Furthermore, this book helps you stay ahead of the curve by guiding you through the latest technological advancements in the Google Cloud ecosystem. You’ll cover essential aspects, from exploring Cloud Composer 2 to the evolution of Airflow 2.5. Additionally, you’ll explore how to work with cutting-edge tools like Dataform, DLP, Dataplex, Dataproc Serverless, and Datastream to perform data governance on datasets.By the end of this book, you'll be equipped to navigate the ever-evolving world of data engineering on Google Cloud, from foundational principles to cutting-edge practices.

38
Ebook

Data Engineering with Scala and Spark. Build streaming and batch pipelines that process massive amounts of data using Scala

Eric Tome, Rupam Bhattacharjee, David Radford

Most data engineers know that performance issues in a distributed computing environment can easily lead to issues impacting the overall efficiency and effectiveness of data engineering tasks. While Python remains a popular choice for data engineering due to its ease of use, Scala shines in scenarios where the performance of distributed data processing is paramount.This book will teach you how to leverage the Scala programming language on the Spark framework and use the latest cloud technologies to build continuous and triggered data pipelines. You’ll do this by setting up a data engineering environment for local development and scalable distributed cloud deployments using data engineering best practices, test-driven development, and CI/CD. You’ll also get to grips with DataFrame API, Dataset API, and Spark SQL API and its use. Data profiling and quality in Scala will also be covered, alongside techniques for orchestrating and performance tuning your end-to-end pipelines to deliver data to your end users. By the end of this book, you will be able to build streaming and batch data pipelines using Scala while following software engineering best practices.

39
Ebook

Data Governance Handbook. A practical approach to building trust in data

Wendy S. Batchelder

2.5 quintillion bytes! This is the amount of data being generated every single day across the globe. As this number continues to grow, understanding and managing data becomes more complex. Data professionals know that it’s their responsibility to navigate this complexity and ensure effective governance, empowering businesses with the right data, at the right time, and with the right controls.If you are a data professional, this book will equip you with valuable guidance to conquer data governance complexities with ease. Written by a three-time chief data officer in global Fortune 500 companies, the Data Governance Handbook is an exhaustive guide to understanding data governance, its key components, and how to successfully position solutions in a way that translates into tangible business outcomes.By the end, you’ll be able to successfully pitch and gain support for your data governance program, demonstrating tangible outcomes that resonate with key stakeholders.

40
Ebook

Data Ingestion with Python Cookbook. A practical guide to ingesting, monitoring, and identifying errors in the data ingestion process

Gláucia Esppenchutz

Data Ingestion with Python Cookbook offers a practical approach to designing and implementing data ingestion pipelines. It presents real-world examples with the most widely recognized open source tools on the market to answer commonly asked questions and overcome challenges.You’ll be introduced to designing and working with or without data schemas, as well as creating monitored pipelines with Airflow and data observability principles, all while following industry best practices. The book also addresses challenges associated with reading different data sources and data formats. As you progress through the book, you’ll gain a broader understanding of error logging best practices, troubleshooting techniques, data orchestration, monitoring, and storing logs for further consultation.By the end of the book, you’ll have a fully automated set that enables you to start ingesting and monitoring your data pipeline effortlessly, facilitating seamless integration with subsequent stages of the ETL process.

41
Ebook

Data Modeling with Snowflake. A practical guide to accelerating Snowflake development using universal data modeling techniques

Serge Gershkovich, Kent Graziano

The Snowflake Data Cloud is one of the fastest-growing platforms for data warehousing and application workloads. Snowflake's scalable, cloud-native architecture and expansive set of features and objects enables you to deliver data solutions quicker than ever before.Yet, we must ensure that these solutions are developed using recommended design patterns and accompanied by documentation that’s easily accessible to everyone in the organization.This book will help you get familiar with simple and practical data modeling frameworks that accelerate agile design and evolve with the project from concept to code. These universal principles have helped guide database design for decades, and this book pairs them with unique Snowflake-native objects and examples like never before – giving you a two-for-one crash course in theory as well as direct application.By the end of this Snowflake book, you’ll have learned how to leverage Snowflake’s innovative features, such as time travel, zero-copy cloning, and change-data-capture, to create cost-effective, efficient designs through time-tested modeling principles that are easily digestible when coupled with real-world examples.

42
Ebook

Data Modeling with Tableau. A practical guide to building data models using Tableau Prep and Tableau Desktop

Kirk Munroe

Tableau is unlike most other BI platforms that have a single data modeling tool and enterprise data model (for example, LookML from Google’s Looker). That doesn’t mean Tableau doesn’t have enterprise data governance; it is both robust and highly flexible. This book will help you effectively use Tableau governance models to build a data-driven organization.Data Modeling with Tableau is an extensive guide, complete with step-by-step explanations of essential concepts, practical examples, and hands-on exercises. As you progress through the chapters, you’ll learn the role that Tableau Prep Builder and Tableau Desktop each play in data modeling. You’ll also explore the components of Tableau Server and Tableau Cloud that make data modeling more robust, secure, and performant. Moreover, by extending data models for Ask and Explain Data, you’ll gain the knowledge required to extend analytics to more people in their organizations, leading to better data-driven decisions. Finally, this book will guide you through the entire Tableau stack and the techniques required to build the right level of governance into Tableau data models for the correct use cases.By the end of this Tableau book, you’ll have a firm understanding of how to leverage data modeling in Tableau to benefit your organization.

43
Ebook

Data Quality in the Age of AI. Building a foundation for AI strategy and data culture

Andrew Jones

As organizations worldwide seek to revamp their data strategies to leverage AI advancements and benefit from newfound capabilities, data quality emerges as the cornerstone for success. Without high-quality data, even the most advanced AI models falter. Enter Data Quality in the Age of AI, a detailed report that illuminates the crucial role of data quality in shaping effective data strategies.Packed with actionable insights, this report highlights the critical role of data quality in your overall data strategy. It equips teams and organizations with the knowledge and tools to thrive in the evolving AI landscape, serving as a roadmap for harnessing the power of data quality, enabling them to unlock their data's full potential, leading to improved performance, reduced costs, increased revenue, and informed strategic decisions.

44
Ebook

Data science od podstaw. Analiza danych w Pythonie. Wydanie II

Joel Grus

Analityka danych jest uważana za wyjątkowo obiecującą dziedzinę wiedzy. Rozwija się błyskawicznie i znajduje coraz to nowsze zastosowania. Profesjonaliści biegli w eksploracji danych i wydobywaniu z nich pożytecznych informacji mogą liczyć na interesującą pracę i bardzo atrakcyjne warunki zatrudnienia. Jednak aby zostać analitykiem danych, trzeba znać matematykę i statystykę, a także nauczyć się programowania. Umiejętności w zakresie uczenia maszynowego i uczenia głębokiego również są ważne. W przypadku tak specyficznej dziedziny, jaką jest nauka o danych, szczególnie istotne jest zdobycie gruntownych podstaw i dogłębne ich zrozumienie. W tym przewodniku opisano zagadnienia związane z podstawami nauki o danych. Wyjaśniono niezbędne elementy matematyki i statystyki. Przedstawiono także techniki budowy potrzebnych narzędzi i sposoby działania najistotniejszych algorytmów. Książka została skonstruowana tak, aby poszczególne implementacje były jak najbardziej przejrzyste i zrozumiałe. Zamieszczone tu przykłady napisano w Pythonie: jest to język dość łatwy do nauki, a pracę na danych ułatwia szereg przydatnych bibliotek Pythona. W drugim wydaniu znalazły się nowe tematy, takie jak uczenie głębokie, statystyka i przetwarzanie języka naturalnego, a także działania na ogromnych zbiorach danych. Zagadnienia te często pojawiają się w pracy współczesnego analityka danych. W książce między innymi: elementy algebry liniowej, statystyki i rachunku prawdopodobieństwa zbieranie, oczyszczanie i eksploracja danych algorytmy modeli analizy danych podstawy uczenia maszynowego systemy rekomendacji i przetwarzanie języka naturalnego analiza sieci społecznościowych i algorytm MapReduce Nauka o danych: bazuj na solidnych podstawach!

45
Ebook

Data Storytelling with Google Looker Studio. A hands-on guide to using Looker Studio for building compelling and effective dashboards

Sireesha Pulipati, Nicholas Kelly

Presenting data visually makes it easier for organizations and individuals to interpret and analyze information. Looker Studio is an easy-to-use, collaborative tool that enables you to transform your data into engaging visualizations. This allows you to build and share dashboards that help monitor key performance indicators, identify patterns, and generate insights to ultimately drive decisions and actions.Data Storytelling with Looker Studio begins by laying out the foundational design principles and guidelines that are essential to creating accurate, effective, and compelling data visualizations. Next, you’ll delve into features and capabilities of Looker Studio – from basic to advanced – and explore their application with examples. The subsequent chapters walk you through building dashboards with a structured three-stage process called the 3D approach using real-world examples that’ll help you understand the various design and implementation considerations. This approach involves determining the objectives and needs of the dashboard, designing its key components and layout, and developing each element of the dashboard.By the end of this book, you will have a solid understanding of the storytelling approach and be able to create data stories of your own using Looker Studio.

46
Ebook

Data Visualization: a successful design process

Andy Kirk, Andy Kirk

Do you want to create more attractive charts? Or do you have huge data sets and need to unearth the key insights in a visual manner? Data visualization is the representation and presentation of data, using proven design techniques to bring alive the patterns, stories and key insights locked away.Data Visualization: a Successful Design Process explores the unique fusion of art and science that is data visualization; a discipline for which instinct alone is insufficient for you to succeed in enabling audiences to discover key trends, insights and discoveries from your data. This book will equip you with the key techniques required to overcome contemporary data visualization challenges. You'll discover a proven design methodology that helps you develop invaluable knowledge and practical capabilities.You'll never again settle for a default Excel chart or resort to fancy-looking graphs. You will be able to work from the starting point of acquiring, preparing and familiarizing with your data, right through to concept design. Choose your killer visual representation to engage and inform your audience.Data Visualization: a Successful Design Process will inspire you to relish any visualization project with greater confidence and bullish know-how; turning challenges into exciting design opportunities.

47
Ebook

Data Visualization with D3.js Cookbook. Turn your digital data into dynamic graphics with this exciting, leading-edge cookbook. Packed with recipes and practical guidance it will quickly make you a proficient user of the D3 JavaScript library

Nick Zhu

D3.js is a JavaScript library designed to display digital data in dynamic graphical form. It helps you bring data to life using HTML, SVG, and CSS. D3 allows great control over the final visual result, and it is the hottest and most powerful web-based data visualization technology on the market today.Data Visualization with D3.js Cookbook is packed with practical recipes to help you learn every aspect of data visualization with D3.Data Visualization with D3.js Cookbook is designed to provide you with all the guidance you need to get to grips with data visualization with D3. With this book, you will create breathtaking data visualization with professional efficiency and precision with the help of practical recipes, illustrations, and code samples.Data Visualization with D3.js Cookbook starts off by touching upon data visualization and D3 basics before gradually taking you through a number of practical recipes covering a wide range of topics you need to know about D3.You will learn the fundamental concepts of data visualization, functional JavaScript, and D3 fundamentals including element selection, data binding, animation, and SVG generation. You will also learn how to leverage more advanced techniques such as custom interpolators, custom tweening, timers, the layout manager, force manipulation, and so on. This book also provides a number of pre-built chart recipes with ready-to-go sample code to help you bootstrap quickly.

48
Ebook

Data Wrangling with R. Load, explore, transform and visualize data for modeling with tidyverse libraries

Gustavo R Santos

In this information era, where large volumes of data are being generated every day, companies want to get a better grip on it to perform more efficiently than before. This is where skillful data analysts and data scientists come into play, wrangling and exploring data to generate valuable business insights. In order to do that, you’ll need plenty of tools that enable you to extract the most useful knowledge from data.Data Wrangling with R will help you to gain a deep understanding of ways to wrangle and prepare datasets for exploration, analysis, and modeling. This data book enables you to get your data ready for more optimized analyses, develop your first data model, and perform effective data visualization.The book begins by teaching you how to load and explore datasets. Then, you’ll get to grips with the modern concepts and tools of data wrangling. As data wrangling and visualization are intrinsically connected, you’ll go over best practices to plot data and extract insights from it. The chapters are designed in a way to help you learn all about modeling, as you will go through the construction of a data science project from end to end, and become familiar with the built-in RStudio, including an application built with Shiny dashboards.By the end of this book, you’ll have learned how to create your first data model and build an application with Shiny in R.