Analiza danych

Analiza danych jest ekscytującą dyscypliną, która umożliwia zrozumienie pewnych zjawisk, uzyskanie wglądu i wiedzy na podstawie surowych danych. Pojęcie to oznacza dokładnie przetwarzanie danych za pomocą technik matematycznych i statystycznych w celu uzyskania cennych wniosków, podjęcia ważnych decyzji i opracowania przydatnych produktów. Termin ten wywodzi się od angielskiego data science, często traktowanego jako synonim takich terminów, jak analityka biznesowa, badania operacyjne, business intelligence, wywiad konkurencyjny, analiza i modelowanie danych, a także pozyskiwanie wiedzy. Dzięki takim technologiom, jak języki Python czy R, platformy Hadoop i Spark masz szansę wyciągnąć maksimum wniosków, dostrzec szanse na rozwój swojej organizacji albo przewidzieć i zapobiec zagrożeniom.

97
Ładowanie...
EBOOK

Ctrl+Shift+Enter Mastering Excel Array Formulas. Do the Impossible with Excel Formulas with Array Formula Magic

MrExcel's Holy Macro! Books, Mike Girvin

Excel users often encounter limitations with standard formulas, but the Ctrl+Shift+Enter technique changes everything. This book is your gateway to mastering Excel array formulas, revealing their potential to solve complex problems effortlessly. You'll start with the basics, understand the fundamental concepts of array formulas, and gradually progress to advanced applications, including mathematical operations, comparative calculations, and dynamic ranges.Each chapter is crafted to build your confidence and expertise. From performing array operations that manipulate large datasets efficiently to utilizing advanced functions like SUMPRODUCT and AGGREGATE, you will learn how to apply these tools to real-world scenarios. The guide also covers the creation of dynamic ranges with INDEX and OFFSET, ensuring your formulas remain flexible and powerful even as your data changes.By the end of the book, you'll not only understand the theoretical aspects of array formulas but also possess the practical skills to implement them effectively. Whether you're creating complex financial models, conducting detailed data analysis, or automating routine tasks, this guide equips you with the knowledge to transform your Excel capabilities and achieve more with less effort.

98
Ładowanie...
EBOOK

D3.js 4.x Data Visualization. Learn to visualize your data with JavaScript - Third Edition

Aendrew Rininsland , Swizec Teller

Want to get started with impressive interactive visualizations and implement them in your daily tasks? This book offers the perfect solution-D3.js. It has emerged as the most popular tool for data visualization. This book will teach you how to implement the features of the latest version of D3 while writing JavaScript using the newest tools and techniqueYou will start by setting up the D3 environment and making your first basic bar chart. You will then build stunning SVG and Canvas-based data visualizations while writing testable, extensible code,as accurate and informative as it is visually stimulating. Step-by-step examples walk you through creating, integrating, and debugging different types of visualization and will have you building basic visualizations (such as bar, line, and scatter graphs) in no time.By the end of this book, you will have mastered the techniques necessary to successfully visualize data and will be ready to use D3 to transform any data into an engaging and sophisticated visualization.

99
Ładowanie...
EBOOK

D3.js Quick Start Guide. Create amazing, interactive visualizations in the browser with JavaScript

Matthew Huntington

D3.js is a JavaScript library that allows you to create graphs and data visualizations in the browser with HTML, SVG, and CSS. This book will take you from the basics of D3.js, so that you can create your own interactive visualizations, to creating the most common graphs that you will encounter as a developer, scientist, statistician, or data scientist.The book begins with an overview of SVG, the basis for creating two-dimensional graphics in the browser. Once the reader has a firm understanding of SVG, we will tackle the basics of how to use D3.js to connect data to our SVG elements. We will start with a scatter plot that maps run data to circles on a graph, and expand our scatter plot to make it interactive. You will see how you can easily allow the users of your graph to create, edit, and delete run data by simply dragging and clicking the graph. Next, we will explore creating a bar graph, using external data from a mock API.After that, we will explore animations and motion with a bar graph, and use various physics-based forces to create a force-directed graph. Finally, we will look at how to use GeoJSON data to create a map.

100
Ładowanie...
EBOOK

Dane grafowe w praktyce. Jak technologie grafowe ułatwiają rozwiązywanie złożonych problemów

Denise Gosnell, Matthias Broecheler

Komputer do pracy potrzebuje liczb i danych. Człowiek chętniej wysnuwa wnioski i wyodrębnia kontekst na podstawie relacji. Te dwa sposoby myślenia są tak odmienne, że komputery do niedawna z trudem wykonywały zadania związane z operowaniem na relacjach. Obecnie może się to zmienić dzięki grafom. Technologie grafowe łączą ludzkie postrzeganie świata i liniową pamięć komputerów. Ich wdrożenie na szerszą skalę będzie stanowić przełom i pozwoli osiągnąć nieznany dziś poziom. Ale najpierw trzeba nauczyć się stosować myślenie grafowe w rozwiązywaniu problemów technicznych. Dzięki tej książce opanujesz podstawy myślenia grafowego. Zapoznasz się z elementarnymi koncepcjami grafowymi: teorią grafów, schematami baz danych, systemami rozproszonymi, a także analizą danych. Dowiesz się również, jak wyglądają typowe wzorce wykorzystania danych grafowych w aplikacjach produkcyjnych. Poznasz sposób, w jaki można te wzorce stosować w praktyce. Pokazano tu, jak używać technik programowania funkcyjnego oraz systemów rozproszonych do tworzenia zapytań i analizowania danych grafowych. Opisano też podstawowe podejścia do proceduralnego przechodzenia przez dane grafowe i ich wykorzystanie za pomocą narzędzi grafowych. W książce: nowy paradygmat rozwiązywania problemów: dane grafowe wzorce wykorzystania danych grafowych przykładowa architektura aplikacji w technologiach relacyjnych i grafowych technologie grafowe a przewidywanie preferencji i zaufania użytkowników filtrowanie kolaboratywne i jego zastosowanie Grafy: przełomowa koncepcja w analizie danych!

101
Ładowanie...
EBOOK

Data Analysis and Business Modeling with Excel 2013. Manage, analyze, and visualize data with Microsoft Excel 2013 to transform raw data into ready to use information

David Rojas

Excel 2013 is one of the easiest to use data analysis tools you will ever come across. Its simplicity and powerful features has made it the go to tool for all your data needs. Complex operations with Excel, such as creating charts and graphs, visualization, and analyzing data make it a great tool for managers, data scientists, financial data analysts, and those who work closely with data. Learning data analysis and will help you bring your data skills to the next level.This book starts by walking you through creating your own data and bringing data into Excel from various sources. You’ll learn the basics of SQL syntax and how to connect it to a Microsoft SQL Server Database using Excel’s data connection tools. You will discover how to spot bad data and strategies to clean that data to make it useful to you. Next, you'll learn to create custom columns, identify key metrics, and make decisions based on business rules. You’ll create macros using VBA and use Excel 2013’s shiny new macros. Finally, at the end of the book, you'll be provided with useful shortcuts and tips, enabling you to do efficient data analysis and business modeling with Excel 2013.

102
Ładowanie...
EBOOK

Data Analysis for Business Decisions. A Laboratory Manual

Mercury Learning and Information, Andres Fortino

This manual is for business analysts to enhance their statistical analysis skills, with case studies focusing mainly on Excel. It covers basic descriptive techniques, linear regression, forecasting, t-Test, chi-square, A/B testing, text data analysis, and Big Data management. Companion files include solution spreadsheets, sample files, and data sets.The course starts with data shaping and cleaning, installing the Analysis ToolPak, and descriptive statistics. It progresses through histograms, scatter plots, Pareto analysis, correlation, linear and multivariate regression, and forecasting. Advanced topics include inferential statistics, contingency analysis, and A/B testing. The final chapters cover text analytics, big data sets, and data visualization.These techniques are crucial for informed business decisions. This book guides users from basic to advanced analysis, blending theory with practical skills. Companion files enhance learning, making this manual essential for mastering statistical analysis in business.

103
Ładowanie...
EBOOK

Data Analysis with IBM SPSS Statistics. Implementing data modeling, descriptive statistics and ANOVA

James C. Mott, Ken Stehlik-Barry, James Sugrue,...

SPSS Statistics is a software package used for logical batched and non-batched statistical analysis. Analytical tools such as SPSS can readily provide even a novice user with an overwhelming amount of information and a broad range of options for analyzing patterns in the data. The journey starts with installing and configuring SPSS Statistics for first use and exploring the data to understand its potential (as well as its limitations). Use the right statistical analysis technique such as regression, classification and more, and analyze your data in the best possible manner. Work with graphs and charts to visualize your findings. With this information in hand, the discovery of patterns within the data can be undertaken. Finally, the high level objective of developing predictive models that can be applied to other situations will be addressed. By the end of this book, you will have a firm understanding of the various statistical analysis techniques offered by SPSS Statistics, and be able to master its use for data analysis with ease.

104
Ładowanie...
EBOOK

Data Analysis with STATA. Explore the big data field and learn how to perform data analytics and predictive modelling in STATA

Prasad Kothari

STATA is an integrated software package that provides you with everything you need for data analysis, data management, and graphics. STATA also provides you with a platform to efficiently perform simulation, regression analysis (linear and multiple) [and custom programming.This book covers data management, graphs visualization, and programming in STATA. Starting with an introduction to STATA and data analytics you’ll move on to STATA programming and data management. Next, the book takes you through data visualization and all the important statistical tests in STATA. Linear and logistic regression in STATA is also covered.As you progress through the book, you will explore a few analyses, including the survey analysis, time series analysis, and survival analysis in STATA. You’ll also discover different types of statistical modelling techniques and learn how to implement these techniques in STATA.

105
Ładowanie...
EBOOK

Data Analytics for Marketing. A practical guide to analyzing marketing data using Python

Guilherme Diaz-Bérrio

Most marketing professionals are familiar with various sources of customer data that promise insights for success. There are extensive sources of data, from customer surveys to digital marketing data. Moreover, there is an increasing variety of tools and techniques to shape data, from small to big data. However, having the right knowledge and understanding the context of how to use data and tools is crucial.In this book, you’ll learn how to give context to your data and turn it into useful information. You’ll understand how and where to use a tool or dataset for a specific question, exploring the what and why questions to provide real value to your stakeholders. Using Python, this book will delve into the basics of analytics and causal inference. Then, you’ll focus on visualization and presentation, followed by understanding guidelines on how to present and condense large amounts of information into KPIs. After learning how to plan ahead and forecast, you’ll delve into customer analytics and insights. Finally, you’ll measure the effectiveness of your marketing efforts and derive insights for data-driven decision-making.By the end of this book, you’ll understand the tools you need to use on specific datasets to provide context and shape your data, as well as to gain information to boost your marketing efforts.

106
Ładowanie...
EBOOK

Data Analytics Made Easy. Analyze and present data to make informed decisions without writing any code

De Mauro

Data Analytics Made Easy is an accessible beginner’s guide for anyone working with data. The book interweaves four key elements:Data visualizations and storytelling – Tired of people not listening to you and ignoring your results? Don’t worry; chapters 7 and 8 show you how to enhance your presentations and engage with your managers and co-workers. Learn to create focused content with a well-structured story behind it to captivate your audience.Automating your data workflows – Improve your productivity by automating your data analysis. This book introduces you to the open-source platform, KNIME Analytics Platform. You’ll see how to use this no-code and free-to-use software to create a KNIME workflow of your data processes just by clicking and dragging components.Machine learning – Data Analytics Made Easy describes popular machine learning approaches in a simplified and visual way before implementing these machine learning models using KNIME. You’ll not only be able to understand data scientists’ machine learning models; you’ll be able to challenge them and build your own.Creating interactive dashboards – Follow the book’s simple methodology to create professional-looking dashboards using Microsoft Power BI, giving users the capability to slice and dice data and drill down into the results.

107
Ładowanie...
EBOOK

Data Analytics. Master the Art of Data Analytics with Essential Tools and Techniques

Mercury Learning and Information, Christopher Greco

Data analytics is becoming increasingly important in our daily lives. This book offers a comprehensive view of data analytics skills, starting with a primer on statistics and progressing to the application of these methods. The text includes various formulas and algorithms used in data analytics, which can be applied in any software to achieve desired results. Through numerous demonstrations, it provides clear instruction on how to incorporate data analytics into critical thinking.The book covers a range of methods and techniques, supplemented with case studies specific to project managers, systems engineers, and cybersecurity professionals. Each profession can practice data analytics relevant to their fields. The main objective is to refresh statistical knowledge necessary for building data analytics models and to foster analytical thinking essential across these professions.From introducing statistics and data to reviewing central tendency measures and probability, the book moves to more complex topics like effect size, analysis methods, and data presentation. By the end of the course, readers will be well-versed in data analytics, ready to apply these skills effectively in their respective fields, enhancing decision-making and analytical thinking.

108
Ładowanie...
EBOOK

Data Analytics Using Splunk 9.x. A practical guide to implementing Splunk's features for performing data analysis at scale

Dr. Nadine Shillingford

Splunk 9 improves on the existing Splunk tool to include important features such as federated search, observability, performance improvements, and dashboarding. This book helps you to make the best use of the impressive and new features to prepare a Splunk installation that can be employed in the data analysis process.Starting with an introduction to the different Splunk components, such as indexers, search heads, and forwarders, this Splunk book takes you through the step-by-step installation and configuration instructions for basic Splunk components using Amazon Web Services (AWS) instances. You’ll import the BOTS v1 dataset into a search head and begin exploring data using the Splunk Search Processing Language (SPL), covering various types of Splunk commands, lookups, and macros. After that, you’ll create tables, charts, and dashboards using Splunk’s new Dashboard Studio, and then advance to work with clustering, container management, data models, federated search, bucket merging, and more.By the end of the book, you’ll not only have learned everything about the latest features of Splunk 9 but also have a solid understanding of the performance tuning techniques in the latest version.

109
Ładowanie...
EBOOK

Data Cleaning and Exploration with Machine Learning. Get to grips with machine learning techniques to achieve sparkling-clean data quickly

Michael Walker

Many individuals who know how to run machine learning algorithms do not have a good sense of the statistical assumptions they make and how to match the properties of the data to the algorithm for the best results.As you start with this book, models are carefully chosen to help you grasp the underlying data, including in-feature importance and correlation, and the distribution of features and targets. The first two parts of the book introduce you to techniques for preparing data for ML algorithms, without being bashful about using some ML techniques for data cleaning, including anomaly detection and feature selection. The book then helps you apply that knowledge to a wide variety of ML tasks. You’ll gain an understanding of popular supervised and unsupervised algorithms, how to prepare data for them, and how to evaluate them. Next, you’ll build models and understand the relationships in your data, as well as perform cleaning and exploration tasks with that data. You’ll make quick progress in studying the distribution of variables, identifying anomalies, and examining bivariate relationships, as you focus more on the accuracy of predictions in this book.By the end of this book, you’ll be able to deal with complex data problems using unsupervised ML algorithms like principal component analysis and k-means clustering.

110
Ładowanie...
EBOOK

Data Cleaning with Power BI. The definitive guide to transforming dirty data into actionable insights

Gus Frazer

Microsoft Power BI offers a range of powerful data cleaning and preparation options through tools such as DAX, Power Query, and the M language. However, despite its user-friendly interface, mastering it can be challenging. Whether you're a seasoned analyst or a novice exploring the potential of Power BI, this comprehensive guide equips you with techniques to transform raw data into a reliable foundation for insightful analysis and visualization.This book serves as a comprehensive guide to data cleaning, starting with data quality, common data challenges, and best practices for handling data. You’ll learn how to import and clean data with Query Editor and transform data using the M query language. As you advance, you’ll explore Power BI’s data modeling capabilities for efficient cleaning and establishing relationships. Later chapters cover best practices for using Power Automate for data cleaning and task automation. Finally, you’ll discover how OpenAI and ChatGPT can make data cleaning in Power BI easier.By the end of the book, you will have a comprehensive understanding of data cleaning concepts, techniques, and how to use Power BI and its tools for effective data preparation.

111
Ładowanie...
EBOOK

Data Democratization with Domo. Bring together every component of your business to make better data-driven decisions using Domo

Jeff Burtenshaw

Domo is a power-packed business intelligence (BI) platform that empowers organizations to track, analyze, and activate data in record time at cloud scale and performance.Data Democratization with Domo begins with an overview of the Domo ecosystem. You’ll learn how to get data into the cloud with Domo data connectors and Workbench; profile datasets; use Magic ETL to transform data; work with in-memory data sculpting tools (Data Views and Beast Modes); create, edit, and link card visualizations; and create card drill paths using Domo Analyzer. Next, you’ll discover options to distribute content with real-time updates using Domo Embed and digital wallboards. As you advance, you’ll understand how to use alerts and webhooks to drive automated actions. You’ll also build and deploy a custom app to the Domo Appstore and find out how to code Python apps, use Jupyter Notebooks, and insert R custom models. Furthermore, you’ll learn how to use Auto ML to automatically evaluate dozens of models for the best fit using SageMaker and produce a predictive model as well as use Python and the Domo Command Line Interface tool to extend Domo. Finally, you’ll learn how to govern and secure the entire Domo platform.By the end of this book, you’ll have gained the skills you need to become a successful Domo master.

112
Ładowanie...
EBOOK

Data Engineering Best Practices. Architect robust and cost-effective data solutions in the cloud era

Richard J. Schiller, David Larochelle

Revolutionize your approach to data processing in the fast-paced business landscape with this essential guide to data engineering. Discover the power of scalable, efficient, and secure data solutions through expert guidance on data engineering principles and techniques. Written by two industry experts with over 60 years of combined experience, it offers deep insights into best practices, architecture, agile processes, and cloud-based pipelines. You’ll start by defining the challenges data engineers face and understand how this agile and future-proof comprehensive data solution architecture addresses them. As you explore the extensive toolkit, mastering the capabilities of various instruments, you’ll gain the knowledge needed for independent research. Covering everything you need, right from data engineering fundamentals, the guide uses real-world examples to illustrate potential solutions. It elevates your skills to architect scalable data systems, implement agile development processes, and design cloud-based data pipelines. The book further equips you with the knowledge to harness serverless computing and microservices to build resilient data applications.By the end, you'll be armed with the expertise to design and deliver high-performance data engineering solutions that are not only robust, efficient, and secure but also future-ready.

113
Ładowanie...
EBOOK

Data Engineering with Apache Spark, Delta Lake, and Lakehouse. Create scalable pipelines that ingest, curate, and aggregate complex data in a timely and secure way

Manoj Kukreja

In the world of ever-changing data and schemas, it is important to build data pipelines that can auto-adjust to changes. This book will help you build scalable data platforms that managers, data scientists, and data analysts can rely on.Starting with an introduction to data engineering, along with its key concepts and architectures, this book will show you how to use Microsoft Azure Cloud services effectively for data engineering. You'll cover data lake design patterns and the different stages through which the data needs to flow in a typical data lake. Once you've explored the main features of Delta Lake to build data lakes with fast performance and governance in mind, you'll advance to implementing the lambda architecture using Delta Lake. Packed with practical examples and code snippets, this book takes you through real-world examples based on production scenarios faced by the author in his 10 years of experience working with big data. Finally, you'll cover data lake deployment strategies that play an important role in provisioning the cloud resources and deploying the data pipelines in a repeatable and continuous way.By the end of this data engineering book, you'll know how to effectively deal with ever-changing data and create scalable data pipelines to streamline data science, ML, and artificial intelligence (AI) tasks.

114
Ładowanie...
EBOOK

Data Engineering with AWS. Learn how to design and build cloud-based data transformation pipelines using AWS

Gareth Eagar

Written by a Senior Data Architect with over twenty-five years of experience in the business, Data Engineering for AWS is a book whose sole aim is to make you proficient in using the AWS ecosystem. Using a thorough and hands-on approach to data, this book will give aspiring and new data engineers a solid theoretical and practical foundation to succeed with AWS.As you progress, you’ll be taken through the services and the skills you need to architect and implement data pipelines on AWS. You'll begin by reviewing important data engineering concepts and some of the core AWS services that form a part of the data engineer's toolkit. You'll then architect a data pipeline, review raw data sources, transform the data, and learn how the transformed data is used by various data consumers. You’ll also learn about populating data marts and data warehouses along with how a data lakehouse fits into the picture. Later, you'll be introduced to AWS tools for analyzing data, including those for ad-hoc SQL queries and creating visualizations. In the final chapters, you'll understand how the power of machine learning and artificial intelligence can be used to draw new insights from data.By the end of this AWS book, you'll be able to carry out data engineering tasks and implement a data pipeline on AWS independently.

115
Ładowanie...
EBOOK

Data Engineering with Azure Databricks. Design, build, and optimize scalable data pipelines and analytics solutions with Azure Databricks

Dmitry Foshin, Dmitry Anoshin, Tonya Chernyshova, Sergii...

Data Engineering with Azure Databricks is your essential guide to building scalable, secure, and high-performing data pipelines using the powerful Databricks platform on Azure. Designed for data engineers, architects, and developers, this book demystifies the complexities of Spark-based workloads, Delta Lake, Unity Catalog, and real-time data processing.Beginning with the foundational role of Azure Databricks in modern data engineering, you’ll explore how to set up robust environments, manage data ingestion with Auto Loader, optimize Spark performance, and orchestrate complex workflows using tools like Azure Data Factory and Airflow.The book offers deep dives into structured streaming, Delta Live Tables, and Delta Lake’s ACID features for data reliability and schema evolution. You’ll also learn how to manage security, compliance, and access controls using Unity Catalog, and gain insights into managing CI/CD pipelines with Azure DevOps and Terraform.With a special focus on machine learning and generative AI, the final chapters guide you in automating model workflows, leveraging MLflow, and fine-tuning large language models on Databricks. Whether you're building a modern data lakehouse or operationalizing analytics at scale, this book provides the tools and insights you need.

116
Ładowanie...
EBOOK

Data Engineering with Databricks Cookbook. Build effective data and AI solutions using Apache Spark, Databricks, and Delta Lake

Pulkit Chadha

Written by a Senior Solutions Architect at Databricks, Data Engineering with Databricks Cookbook will show you how to effectively use Apache Spark, Delta Lake, and Databricks for data engineering, starting with comprehensive introduction to data ingestion and loading with Apache Spark.What makes this book unique is its recipe-based approach, which will help you put your knowledge to use straight away and tackle common problems. You’ll be introduced to various data manipulation and data transformation solutions that can be applied to data, find out how to manage and optimize Delta tables, and get to grips with ingesting and processing streaming data. The book will also show you how to improve the performance problems of Apache Spark apps and Delta Lake. Advanced recipes later in the book will teach you how to use Databricks to implement DataOps and DevOps practices, as well as how to orchestrate and schedule data pipelines using Databricks Workflows. You’ll also go through the full process of setup and configuration of the Unity Catalog for data governance.By the end of this book, you’ll be well-versed in building reliable and scalable data pipelines using modern data engineering technologies.

117
Ładowanie...
EBOOK

Data Engineering with dbt. A practical guide to building a cloud-based, pragmatic, and dependable data platform with SQL

Roberto Zagni

dbt Cloud helps professional analytics engineers automate the application of powerful and proven patterns to transform data from ingestion to delivery, enabling real DataOps.This book begins by introducing you to dbt and its role in the data stack, along with how it uses simple SQL to build your data platform, helping you and your team work better together. You’ll find out how to leverage data modeling, data quality, master data management, and more to build a simple-to-understand and future-proof solution. As you advance, you’ll explore the modern data stack, understand how data-related careers are changing, and see how dbt enables this transition into the emerging role of an analytics engineer. The chapters help you build a sample project using the free version of dbt Cloud, Snowflake, and GitHub to create a professional DevOps setup with continuous integration, automated deployment, ELT run, scheduling, and monitoring, solving practical cases you encounter in your daily work.By the end of this dbt book, you’ll be able to build an end-to-end pragmatic data platform by ingesting data exported from your source systems, coding the needed transformations, including master data and the desired business rules, and building well-formed dimensional models or wide tables that’ll enable you to build reports with the BI tool of your choice.

118
Ładowanie...
EBOOK

Data Engineering with Google Cloud Platform. A guide to leveling up as a data engineer by building a scalable data platform with Google Cloud - Second Edition

Adi Wijaya, António Vilares

The second edition of Data Engineering with Google Cloud builds upon the success of the first edition by offering enhanced clarity and depth to data professionals navigating the intricate landscape of data engineering.Beyond its foundational lessons, this new edition delves into the essential realm of data governance within Google Cloud, providing you with invaluable insights into managing and optimizing data resources effectively. Written by a Data Strategic Cloud Engineer at Google, this book helps you stay ahead of the curve by guiding you through the latest technological advancements in the Google Cloud ecosystem. You’ll cover essential aspects, from exploring Cloud Composer 2 to the evolution of Airflow 2.5. Additionally, you’ll explore how to work with cutting-edge tools like Dataform, DLP, Dataplex, Dataproc Serverless, and Datastream to perform data governance on datasets.By the end of this book, you'll be equipped to navigate the ever-evolving world of data engineering on Google Cloud, from foundational principles to cutting-edge practices.

119
Ładowanie...
EBOOK

Data Engineering with Google Cloud Platform. A practical guide to operationalizing scalable data analytics systems on GCP

Adi Wijaya

With this book, you'll understand how the highly scalable Google Cloud Platform (GCP) enables data engineers to create end-to-end data pipelines right from storing and processing data and workflow orchestration to presenting data through visualization dashboards.Starting with a quick overview of the fundamental concepts of data engineering, you'll learn the various responsibilities of a data engineer and how GCP plays a vital role in fulfilling those responsibilities. As you progress through the chapters, you'll be able to leverage GCP products to build a sample data warehouse using Cloud Storage and BigQuery and a data lake using Dataproc. The book gradually takes you through operations such as data ingestion, data cleansing, transformation, and integrating data with other sources. You'll learn how to design IAM for data governance, deploy ML pipelines with the Vertex AI, leverage pre-built GCP models as a service, and visualize data with Google Data Studio to build compelling reports. Finally, you'll find tips on how to boost your career as a data engineer, take the Professional Data Engineer certification exam, and get ready to become an expert in data engineering with GCP.By the end of this data engineering book, you'll have developed the skills to perform core data engineering tasks and build efficient ETL data pipelines with GCP.

120
Ładowanie...
EBOOK

Data Engineering with Scala and Spark. Build streaming and batch pipelines that process massive amounts of data using Scala

Eric Tome, Rupam Bhattacharjee, David Radford

Most data engineers know that performance issues in a distributed computing environment can easily lead to issues impacting the overall efficiency and effectiveness of data engineering tasks. While Python remains a popular choice for data engineering due to its ease of use, Scala shines in scenarios where the performance of distributed data processing is paramount.This book will teach you how to leverage the Scala programming language on the Spark framework and use the latest cloud technologies to build continuous and triggered data pipelines. You’ll do this by setting up a data engineering environment for local development and scalable distributed cloud deployments using data engineering best practices, test-driven development, and CI/CD. You’ll also get to grips with DataFrame API, Dataset API, and Spark SQL API and its use. Data profiling and quality in Scala will also be covered, alongside techniques for orchestrating and performance tuning your end-to-end pipelines to deliver data to your end users. By the end of this book, you will be able to build streaming and batch data pipelines using Scala while following software engineering best practices.