Big data
Developing Kaggle Notebooks. Pave your way to becoming a Kaggle Notebooks Grandmaster
Gabriel Preda, D. Sculley, Anthony Goldbloom
Developing Kaggle Notebooks introduces you to data analysis, with a focus on using Kaggle Notebooks to simultaneously achieve mastery in this fi eld and rise to the top of the Kaggle Notebooks tier. The book is structured as a sevenstep data analysis journey, exploring the features available in Kaggle Notebooks alongside various data analysis techniques.For each topic, we provide one or more notebooks, developing reusable analysis components through Kaggle's Utility Scripts feature, introduced progressively, initially as part of a notebook, and later extracted for use across future notebooks to enhance code reusability on Kaggle. It aims to make the notebooks' code more structured, easy to maintain, and readable.Although the focus of this book is on data analytics, some examples will guide you in preparing a complete machine learning pipeline using Kaggle Notebooks. Starting from initial data ingestion and data quality assessment, you'll move on to preliminary data analysis, advanced data exploration, feature qualifi cation to build a model baseline, and feature engineering. You'll also delve into hyperparameter tuning to iteratively refi ne your model and prepare for submission in Kaggle competitions. Additionally, the book touches on developing notebooks that leverage the power of generative AI using Kaggle Models.
Bryon Kataoka, James Brennan, Ashish Aggarwal
IBM API Connect enables organizations to drive digital innovation using its scalable and robust API management capabilities across multi-cloud and hybrid environments. With API Connect's security, flexibility, and high performance, you'll be able to meet the needs of your enterprise and clients by extending your API footprint. This book provides a complete roadmap to create, manage, govern, and publish your APIs.You'll start by learning about API Connect components, such as API managers, developer portals, gateways, and analytics subsystems, as well as the management capabilities provided by CLI commands. You’ll then develop APIs using OpenAPI and discover how you can enhance them with logic policies. The book shows you how to modernize SOAP and FHIR REST services as secure APIs with authentication, OAuth2/OpenID, and JWT, and demonstrates how API Connect provides safeguards for GraphQL APIs as well as published APIs that are easy to discover and well documented. As you advance, the book guides you in generating unit tests that supplement DevOps pipelines using Git and Jenkins for improved agility, and concludes with best practices for implementing API governance and customizing API Connect components.By the end of this book, you'll have learned how to transform your business by speeding up the time-to-market of your products and increase the ROI for your enterprise.
Srikumar Nair
Microsoft Dataverse for Teams is a built-in, low-code data platform for Teams and enables everyone to easily build and deploy apps, flows, and intelligent chatbots using Power Apps, Power Automate, and Power Virtual Agents (PVA) embedded in Microsoft Teams.Without learning any coding language, you will be able to build apps with step-by-step explanations for setting up Teams, creating tables to store data, and leverage the data for your digital solutions. With the techniques covered in the book, you’ll be able to develop your first app with Dataverse for Teams within an hour! You’ll then learn how to automate repetitive tasks or build alerts using Power Automate and Power Virtual Agents. As you get to grips with building these digital solutions, you’ll also be able to understand when to consider upgrading from Dataverse for Teams to Dataverse, along with its advanced features. Finally, you’ll explore features for administration and governance and understand the licensing requirements of Microsoft Dataverse for Teams and PowerApps.Having acquired the skills to build and deploy an enterprise-grade digital solution, by the end of the book, you will have become a qualified citizen developer and be ready to lead a digital revolution in your organization.
Distributed Data Systems with Azure Databricks. Create, deploy, and manage enterprise data pipelines
Alan Bernardo Palacio
Microsoft Azure Databricks helps you to harness the power of distributed computing and apply it to create robust data pipelines, along with training and deploying machine learning and deep learning models. Databricks' advanced features enable developers to process, transform, and explore data. Distributed Data Systems with Azure Databricks will help you to put your knowledge of Databricks to work to create big data pipelines. The book provides a hands-on approach to implementing Azure Databricks and its associated methodologies that will make you productive in no time. Complete with detailed explanations of essential concepts, practical examples, and self-assessment questions, you’ll begin with a quick introduction to Databricks core functionalities, before performing distributed model training and inference using TensorFlow and Spark MLlib. As you advance, you’ll explore MLflow Model Serving on Azure Databricks and implement distributed training pipelines using HorovodRunner in Databricks. Finally, you’ll discover how to transform, use, and obtain insights from massive amounts of data to train predictive models and create entire fully working data pipelines. By the end of this MS Azure book, you’ll have gained a solid understanding of how to work with Databricks to create and manage an entire big data pipeline.
Guanhua Wang
Reducing time cost in machine learning leads to a shorter waiting time for model training and a faster model updating cycle. Distributed machine learning enables machine learning practitioners to shorten model training and inference time by orders of magnitude. With the help of this practical guide, you'll be able to put your Python development knowledge to work to get up and running with the implementation of distributed machine learning, including multi-node machine learning systems, in no time. You'll begin by exploring how distributed systems work in the machine learning area and how distributed machine learning is applied to state-of-the-art deep learning models. As you advance, you'll see how to use distributed systems to enhance machine learning model training and serving speed. You'll also get to grips with applying data parallel and model parallel approaches before optimizing the in-parallel model training and serving pipeline in local clusters or cloud environments. By the end of this book, you'll have gained the knowledge and skills needed to build and deploy an efficient data processing pipeline for machine learning model training and inference in a distributed manner.
Dodaj mocy Power BI! Jak za pomocą kodu w Pythonie i R pobierać, przekształcać i wizualizować dane
Luca Zavarella, Francesca Lazzeri
Ważnym zadaniem inżynierów danych jest kreowanie modeli uczenia maszynowego. Używa się do tego narzędzi do analizy biznesowej, takich jak Power BI. Możliwości Power BI są imponujące, a można je dodatkowo rozbudować. Jedną z ciekawszych metod wzbogacania modelu danych i wizualizacji Power BI jest zastosowanie złożonych algorytmów zaimplementowanych w językach Python i R. W ten sposób można nie tylko tworzyć interesujące wizualizacje danych, ale także pozyskiwać dzięki nim kluczowe dla biznesu informacje. Dzięki tej książce dowiesz się, jak to zrobić. Zaczniesz od przygotowania środowiska Power BI do używania skryptów w Pythonie i R. Następnie będziesz importować dane z nieobsługiwanych obiektów i przekształcać je za pomocą wyrażeń regularnych i złożonych algorytmów. Nauczysz się wywoływać zewnętrzne interfejsy API i korzystać z zaawansowanych technik w celu przeprowadzenia dogłębnych analiz i wyodrębnienia cennych informacji za pomocą narzędzi statystyki i uczenia maszynowego, a także poprzez zastosowanie optymalizacji liniowej i innych algorytmów. Zapoznasz się również z głównymi cechami statystycznymi zestawów danych i z metodami tworzenia różnych wykresów ułatwiających zrozumienie relacji między zmiennymi. Najciekawsze zagadnienia: złożone przekształcanie danych w Power BI za pomocą skryptów Pythona i R anonimizacja i pseudonimizacja danych praca z dużymi zestawami danych wartości odstające i brakujące dla danych wielowymiarowych i szeregów czasowych tworzenie złożonych wizualizacji danych Wyzwól potężną moc Power BI!
Don't Fear the Spreadsheet. A Beginner's Guide to Overcoming Excel's Frustrations
MrExcel's Holy Macro! Books, Tyler Nash, Bill...
This book is written in an easy-to-follow question-and-answer format, specifically designed for complete Excel beginners. Focusing on the extreme basics of using spreadsheets, it avoids overwhelming readers with advanced topics and instead builds a foundational understanding. Readers will quickly gain a passable knowledge of the program, addressing common fears and frustrations through clear explanations and practical examples.The guide answers hundreds of everyday questions, such as Can I delete data without changing formatting? and How do I use text-wrapping? as well as slightly more advanced queries like What is a Macro, and how do I create one? It empowers users by breaking down intimidating concepts into manageable steps, making Excel approachable and useful for even the most inexperienced users. The focus is on helping readers become comfortable with essential tasks, from merging cells and formatting text to understanding formulas and navigating the interface.Aimed at the 40 percent of Excel users who have never entered a formula, this book demystifies the program's tools and functions, transforming confusion into confidence. By the end, readers will feel equipped to use Excel effectively for personal and professional tasks, overcoming barriers to productivity.
Andrew Jones, Kevin Hu
Despite the passage of time and the evolution of technology and architecture, the challenges we face in building data platforms persist. Our data often remains unreliable, lacks trust, and fails to deliver the promised value.With Driving Data Quality with Data Contracts, you’ll discover the potential of data contracts to transform how you build your data platforms, finally overcoming these enduring problems. You’ll learn how establishing contracts as the interface allows you to explicitly assign responsibility and accountability of the data to those who know it best—the data generators—and give them the autonomy to generate and manage data as required. The book will show you how data contracts ensure that consumers get quality data with clearly defined expectations, enabling them to build on that data with confidence to deliver valuable analytics, performant ML models, and trusted data-driven products.By the end of this book, you’ll have gained a comprehensive understanding of how data contracts can revolutionize your organization’s data culture and provide a competitive advantage by unlocking the real value within your data.