EbookiAnaliza danych

Analiza danych

Analiza danych jest ekscytującą dyscypliną, która umożliwia zrozumienie pewnych zjawisk, uzyskanie wglądu i wiedzy na podstawie surowych danych. Pojęcie to oznacza dokładnie przetwarzanie danych za pomocą technik matematycznych i statystycznych w celu uzyskania cennych wniosków, podjęcia ważnych decyzji i opracowania przydatnych produktów. Termin ten wywodzi się od angielskiego data science, często traktowanego jako synonim takich terminów, jak analityka biznesowa, badania operacyjne, business intelligence, wywiad konkurencyjny, analiza i modelowanie danych, a także pozyskiwanie wiedzy. Dzięki takim technologiom, jak języki Python czy R, platformy Hadoop i Spark masz szansę wyciągnąć maksimum wniosków, dostrzec szanse na rozwój swojej organizacji albo przewidzieć i zapobiec zagrożeniom.

siatka lista

169

EBOOK

Developing Kaggle Notebooks. Pave your way to becoming a Kaggle Notebooks Grandmaster

Gabriel Preda, D. Sculley, Anthony Goldbloom

Developing Kaggle Notebooks introduces you to data analysis, with a focus on using Kaggle Notebooks to simultaneously achieve mastery in this fi eld and rise to the top of the Kaggle Notebooks tier. The book is structured as a sevenstep data analysis journey, exploring the features available in Kaggle Notebooks alongside various data analysis techniques.For each topic, we provide one or more notebooks, developing reusable analysis components through Kaggle's Utility Scripts feature, introduced progressively, initially as part of a notebook, and later extracted for use across future notebooks to enhance code reusability on Kaggle. It aims to make the notebooks' code more structured, easy to maintain, and readable.Although the focus of this book is on data analytics, some examples will guide you in preparing a complete machine learning pipeline using Kaggle Notebooks. Starting from initial data ingestion and data quality assessment, you'll move on to preliminary data analysis, advanced data exploration, feature qualifi cation to build a model baseline, and feature engineering. You'll also delve into hyperparameter tuning to iteratively refi ne your model and prepare for submission in Kaggle competitions. Additionally, the book touches on developing notebooks that leverage the power of generative AI using Kaggle Models.

170

EBOOK

Digital Transformation and Modernization with IBM API Connect. A practical guide to developing, deploying, and managing high-performance and secure hybrid-cloud APIs

Bryon Kataoka, James Brennan, Ashish Aggarwal

IBM API Connect enables organizations to drive digital innovation using its scalable and robust API management capabilities across multi-cloud and hybrid environments. With API Connect's security, flexibility, and high performance, you'll be able to meet the needs of your enterprise and clients by extending your API footprint. This book provides a complete roadmap to create, manage, govern, and publish your APIs.You'll start by learning about API Connect components, such as API managers, developer portals, gateways, and analytics subsystems, as well as the management capabilities provided by CLI commands. You’ll then develop APIs using OpenAPI and discover how you can enhance them with logic policies. The book shows you how to modernize SOAP and FHIR REST services as secure APIs with authentication, OAuth2/OpenID, and JWT, and demonstrates how API Connect provides safeguards for GraphQL APIs as well as published APIs that are easy to discover and well documented. As you advance, the book guides you in generating unit tests that supplement DevOps pipelines using Git and Jenkins for improved agility, and concludes with best practices for implementing API governance and customizing API Connect components.By the end of this book, you'll have learned how to transform your business by speeding up the time-to-market of your products and increase the ROI for your enterprise.

171

EBOOK

Digital Transformation with Dataverse for Teams. Become a citizen developer and lead the digital transformation wave with Microsoft Teams and Power Platform

Srikumar Nair

Microsoft Dataverse for Teams is a built-in, low-code data platform for Teams and enables everyone to easily build and deploy apps, flows, and intelligent chatbots using Power Apps, Power Automate, and Power Virtual Agents (PVA) embedded in Microsoft Teams.Without learning any coding language, you will be able to build apps with step-by-step explanations for setting up Teams, creating tables to store data, and leverage the data for your digital solutions. With the techniques covered in the book, you’ll be able to develop your first app with Dataverse for Teams within an hour! You’ll then learn how to automate repetitive tasks or build alerts using Power Automate and Power Virtual Agents. As you get to grips with building these digital solutions, you’ll also be able to understand when to consider upgrading from Dataverse for Teams to Dataverse, along with its advanced features. Finally, you’ll explore features for administration and governance and understand the licensing requirements of Microsoft Dataverse for Teams and PowerApps.Having acquired the skills to build and deploy an enterprise-grade digital solution, by the end of the book, you will have become a qualified citizen developer and be ready to lead a digital revolution in your organization.

172

EBOOK

Distributed Data Systems with Azure Databricks. Create, deploy, and manage enterprise data pipelines

Alan Bernardo Palacio

Microsoft Azure Databricks helps you to harness the power of distributed computing and apply it to create robust data pipelines, along with training and deploying machine learning and deep learning models. Databricks' advanced features enable developers to process, transform, and explore data. Distributed Data Systems with Azure Databricks will help you to put your knowledge of Databricks to work to create big data pipelines. The book provides a hands-on approach to implementing Azure Databricks and its associated methodologies that will make you productive in no time. Complete with detailed explanations of essential concepts, practical examples, and self-assessment questions, you’ll begin with a quick introduction to Databricks core functionalities, before performing distributed model training and inference using TensorFlow and Spark MLlib. As you advance, you’ll explore MLflow Model Serving on Azure Databricks and implement distributed training pipelines using HorovodRunner in Databricks. Finally, you’ll discover how to transform, use, and obtain insights from massive amounts of data to train predictive models and create entire fully working data pipelines. By the end of this MS Azure book, you’ll have gained a solid understanding of how to work with Databricks to create and manage an entire big data pipeline.

173

EBOOK

Don't Fear the Spreadsheet. A Beginner's Guide to Overcoming Excel's Frustrations

MrExcel's Holy Macro! Books, Tyler Nash, Bill...

This book is written in an easy-to-follow question-and-answer format, specifically designed for complete Excel beginners. Focusing on the extreme basics of using spreadsheets, it avoids overwhelming readers with advanced topics and instead builds a foundational understanding. Readers will quickly gain a passable knowledge of the program, addressing common fears and frustrations through clear explanations and practical examples.The guide answers hundreds of everyday questions, such as Can I delete data without changing formatting? and How do I use text-wrapping? as well as slightly more advanced queries like What is a Macro, and how do I create one? It empowers users by breaking down intimidating concepts into manageable steps, making Excel approachable and useful for even the most inexperienced users. The focus is on helping readers become comfortable with essential tasks, from merging cells and formatting text to understanding formulas and navigating the interface.Aimed at the 40 percent of Excel users who have never entered a formula, this book demystifies the program's tools and functions, transforming confusion into confidence. By the end, readers will feel equipped to use Excel effectively for personal and professional tasks, overcoming barriers to productivity.

174

EBOOK

Driving Data Quality with Data Contracts. A comprehensive guide to building reliable, trusted, and effective data platforms

Andrew Jones, Kevin Hu

Despite the passage of time and the evolution of technology and architecture, the challenges we face in building data platforms persist. Our data often remains unreliable, lacks trust, and fails to deliver the promised value.With Driving Data Quality with Data Contracts, you’ll discover the potential of data contracts to transform how you build your data platforms, finally overcoming these enduring problems. You’ll learn how establishing contracts as the interface allows you to explicitly assign responsibility and accountability of the data to those who know it best—the data generators—and give them the autonomy to generate and manage data as required. The book will show you how data contracts ensure that consumers get quality data with clearly defined expectations, enabling them to build on that data with confidence to deliver valuable analytics, performant ML models, and trusted data-driven products.By the end of this book, you’ll have gained a comprehensive understanding of how data contracts can revolutionize your organization’s data culture and provide a competitive advantage by unlocking the real value within your data.

175

EBOOK

Dziennikarstwo danych i data storytelling

Łukasz Żyła

Bez danych jesteś jedynie kolejną osobą z opinią... Dziennikarstwo danych przeżywa dziś prawdziwy rozkwit. Dzieje się tak dlatego, że nasze życie w dużej mierze przeniosło się do internetu, a internet to... dane. Megabajty, gigabajty, terabajty danych. Misją współczesnego dziennikarza jest przedstawiać je społeczeństwu rzetelnie, a równocześnie pięknie, czyli w sposób zrozumiały, łatwy do przyswojenia. Nim się jednak owe dane pięknie zestawi, trzeba je znaleźć. Gdzie szukać? Jak je zdobyć? W jaki sposób opowiedzieć dane? Na takie pytania autor odpowiada w tej książce. Nie przeczytasz w niej o "ładnych wykresach", bo wbrew pozorom to nie one są esencją dziennikarstwa danych i data storytellingu. Dowiesz się natomiast, gdzie biją źródła potrzebnych Ci informacji, jak je przetwarzać i analizować. Znajdziesz tu także wskazówki, w jaki sposób tworzyć dobre wizualizacje za pomocą prostych aplikacji dostępnych za darmo w internecie i jak kreować angażujące odbiorców data stories. Na koniec wejdziesz na wyższy poziom - nauczysz się prezentować dane z wykorzystaniem kodu programistycznego. Kto? Co? Jak? Gdzie? Kiedy? ― odpowiedzi na te podstawowe pytania musi znaleźć każdy dziennikarz, który chce rzetelnie wykonać swoją pracę. Jednocześnie przy zalewie informacji, danych ze źródeł, których weryfikacja jest równie czasochłonna, każdy wykonujący ten piękny zawód coraz bardziej przypomina mitycznego Syzyfa. Przebicie się przez gigabajty informacji, przetworzenie ich i stworzenie materiału, który tłumaczy odbiorcy rzeczywistość, jest dziś działaniem obarczonym ogromnym wysiłkiem i jeszcze większym ryzykiem. Kaskadowy spadek zaufania do instytucji publicznych i prywatnych, z jakim mamy do czynienia od lat, oddziałuje także na media, z jednej strony wystawiane na szereg nacisków biznesowych, politycznych i społecznych, z drugiej ― borykające się z ciągłymi problemami finansowymi. Co warto wiedzieć, dobre dziennikarstwo, jakościowe dziennikarstwo to coś, co wymaga swobodnego poruszania się autorów w przestrzeni internetu i danych, a także poznania podstaw funkcjonowania w tej przestrzeni. Dlatego, jeżeli chcemy mieć przynajmniej cień nadziei na dobrze wykonaną pracę, warto sięgnąć po książkę Łukasza Żyły. W zawodzie zawsze mi powtarzano, że tej profesji człowiek uczy się tylko w praktyce i na pewno nie na studiach. Nadal tak jest, choć czasy, w których media dosłownie pączkują na każdym kroku i angażują coraz młodszych adeptów dziennikarstwa, wymagają, by sięgnąć po informacyjną pigułę, swoisty wykrywacz min, dzięki czemu te pierwsze kroki wspomniany początkujący dziennikarz będzie mógł stawiać względnie bezpiecznie. Dziennikarstwo danych i data storytelling to także pozycja dla osób doświadczonych w tym zawodzie. Powód jest oczywisty, technologia zmieniła dziennikarstwo i w pędzie żywiołu, którym ono jest, łatwo popaść w bezpieczną i przez to złudną rutynę, a wtedy jesteśmy o krok od poważnego błędu. Dzięki książce Łukasza Żyły łatwiejsze do ominięcia będą cyfrowe rafy, którymi sieć jest usłana. Bartosz Kurek, były dziennikarz Polsatu, obecnie menedżer ds. public affairs w Philip Morris Co wy tam tak naprawdę robicie? ― to częste pytanie, kiedy mówię, że pracuję w dziale danych „Wyborczej”. Niektórzy ze znawstwem odpowiadają: „Aaa, czyli robicie analizy wyników sprzedaży gazety?”. Inni zmieniają temat, spodziewając się, że zarzucę ich nudnymi opowieściami o uzupełnianiu tabelek liczbami. Co ciekawe, pytanie o to, jak dokładnie wygląda nasza praca, zadają również dziennikarze. Teraz, zamiast wchodzić w szczegóły, będę mógł zacząć odpowiedź od słów: „Jest taka książka, warto przeczytać…”, bo Łukasz w bardzo przystępny sposób tłumaczy, czym to się je. I myślę, że niezależnie od tego, jaką działką dziennikarstwa się zajmujecie, znajdziecie w niej coś dla siebie. Części dotyczące współpracy z urzędnikami, dostępu do informacji czy opowiadania historii powinien przyswoić każdy, kto będzie pracował w zawodzie. Po te o opracowywaniu danych sięgną ambitniejsi, a może po prostu bardziej przewidujący, bo pisać potrafi wielu, ale zdolność pisania połączona z umiejętnością analizowania, programowania lub wizualizowania robi z dziennikarza człowieka do zadań specjalnych. Kiedy czytałem tę książkę, wiele razy żałowałem, że czegoś takiego nie było, kiedy ja zaczynałem przygodę z danymi. Dzięki niej widzę, ile jeszcze powinienem się w tej dziedzinie nauczyć. Dominik Uhlig, szef BIQdata.pl ― działu danych „Gazety Wyborczej”

176

EBOOK

Effective Amazon Machine Learning. Expert web services for machine learning on cloud

Alexis Perrier

Predictive analytics is a complex domain requiring coding skills, an understanding of the mathematical concepts underpinning machine learning algorithms, and the ability to create compelling data visualizations. Following AWS simplifying Machine learning, this book will help you bring predictive analytics projects to fruition in three easy steps: data preparation, model tuning, and model selection.This book will introduce you to the Amazon Machine Learning platform and will implement core data science concepts such as classification, regression, regularization, overfitting, model selection, and evaluation. Furthermore, you will learn to leverage the Amazon Web Service (AWS) ecosystem for extended access to data sources, implement realtime predictions, and run Amazon Machine Learning projects via the command line and the Python SDK. Towards the end of the book, you will also learn how to apply these services to other problems, such as text mining, and to more complex datasets.