Big data
Trenton Potgieter
AWS provides a wide range of solutions to help automate a machine learning workflow with just a few lines of code. With this practical book, you'll learn how to automate a machine learning pipeline using the various AWS services.Automated Machine Learning on AWS begins with a quick overview of what the machine learning pipeline/process looks like and highlights the typical challenges that you may face when building a pipeline. Throughout the book, you'll become well versed with various AWS solutions such as Amazon SageMaker Autopilot, AutoGluon, and AWS Step Functions to automate an end-to-end ML process with the help of hands-on examples. The book will show you how to build, monitor, and execute a CI/CD pipeline for the ML process and how the various CI/CD services within AWS can be applied to a use case with the Cloud Development Kit (CDK). You'll understand what a data-centric ML process is by working with the Amazon Managed Services for Apache Airflow and then build a managed Airflow environment. You'll also cover the key success criteria for an MLSDLC implementation and the process of creating a self-mutating CI/CD pipeline using AWS CDK from the perspective of the platform engineering team.By the end of this AWS book, you'll be able to effectively automate a complete machine learning pipeline and deploy it to production.
Luis Sobrecueva
AutoKeras is an AutoML open-source software library that provides easy access to deep learning models. If you are looking to build deep learning model architectures and perform parameter tuning automatically using AutoKeras, then this book is for you.This book teaches you how to develop and use state-of-the-art AI algorithms in your projects. It begins with a high-level introduction to automated machine learning, explaining all the concepts required to get started with this machine learning approach. You will then learn how to use AutoKeras for image and text classification and regression. As you make progress, you'll discover how to use AutoKeras to perform sentiment analysis on documents. This book will also show you how to implement a custom model for topic classification with AutoKeras. Toward the end, you will explore advanced concepts of AutoKeras such as working with multi-modal data and multi-task, customizing the model with AutoModel, and visualizing experiment results using AutoKeras Extensions.By the end of this machine learning book, you will be able to confidently use AutoKeras to design your own custom machine learning models in your company.
Luis Sobrecueva
AutoKeras is an AutoML open-source software library that provides easy access to deep learning models. If you are looking to build deep learning model architectures and perform parameter tuning automatically using AutoKeras, then this book is for you.This book teaches you how to develop and use state-of-the-art AI algorithms in your projects. It begins with a high-level introduction to automated machine learning, explaining all the concepts required to get started with this machine learning approach. You will then learn how to use AutoKeras for image and text classification and regression. As you make progress, you'll discover how to use AutoKeras to perform sentiment analysis on documents. This book will also show you how to implement a custom model for topic classification with AutoKeras. Toward the end, you will explore advanced concepts of AutoKeras such as working with multi-modal data and multi-task, customizing the model with AutoModel, and visualizing experiment results using AutoKeras Extensions.By the end of this machine learning book, you will be able to confidently use AutoKeras to design your own custom machine learning models in your company.
Dennis Sawyers
Automated Machine Learning with Microsoft Azure will teach you how to build high-performing, accurate machine learning models in record time. It will equip you with the knowledge and skills to easily harness the power of artificial intelligence and increase the productivity and profitability of your business. Guided user interfaces (GUIs) enable both novices and seasoned data scientists to easily train and deploy machine learning solutions to production. Using a careful, step-by-step approach, this book will teach you how to use Azure AutoML with a GUI as well as the AzureML Python software development kit (SDK).First, you'll learn how to prepare data, train models, and register them to your Azure Machine Learning workspace. You'll then discover how to take those models and use them to create both automated batch solutions using machine learning pipelines and real-time scoring solutions using Azure Kubernetes Service (AKS). Finally, you will be able to use AutoML on your own data to not only train regression, classification, and forecasting models but also use them to solve a wide variety of business problems.By the end of this Azure book, you'll be able to show your business partners exactly how your ML models are making predictions through automatically generated charts and graphs, earning their trust and respect.
Somanath Nanda, Weslley Moura
The AWS Certified Machine Learning Specialty (MLS-C01) exam evaluates your ability to execute machine learning tasks on AWS infrastructure. This comprehensive book aligns with the latest exam syllabus, offering practical examples to support your real-world machine learning projects on AWS. Additionally, you'll get lifetime access to supplementary online resources, including mock exams with exam-like timers, detailed solutions, interactive flashcards, and invaluable exam tips, all accessible across various devices—PCs, tablets, and smartphones.Throughout the book, you’ll learn data preparation techniques for machine learning, covering diverse methods for data manipulation and transformation across different variable types. Addressing challenges such as missing data and outliers, the book guides you through an array of machine learning tasks including classification, regression, clustering, forecasting, anomaly detection, text mining, and image processing, accompanied by requisite machine learning algorithms essential for exam success. The book helps you master the deployment of models in production environments and their subsequent monitoring.Equipped with insights from this book and the accompanying mock exams, you'll be fully prepared to achieve the AWS MLS-C01 certification.
Somanath Nanda, Weslley Moura
The AWS Certified Machine Learning Specialty exam tests your competency to perform machine learning (ML) on AWS infrastructure. This book covers the entire exam syllabus using practical examples to help you with your real-world machine learning projects on AWS.Starting with an introduction to machine learning on AWS, you'll learn the fundamentals of machine learning and explore important AWS services for artificial intelligence (AI). You'll then see how to prepare data for machine learning and discover a wide variety of techniques for data manipulation and transformation for different types of variables. The book also shows you how to handle missing data and outliers and takes you through various machine learning tasks such as classification, regression, clustering, forecasting, anomaly detection, text mining, and image processing, along with the specific ML algorithms you need to know to pass the exam. Finally, you'll explore model evaluation, optimization, and deployment and get to grips with deploying models in a production environment and monitoring them.By the end of this book, you'll have gained knowledge of the key challenges in machine learning and the solutions that AWS has released for each of them, along with the tools, methods, and techniques commonly used in each domain of AWS ML.
Dmitry Foshin, Tonya Chernyshova, Dmitry Anoshin, Xenia...
This new edition of the Azure Data Factory book, fully updated to reflect ADS V2, will help you get up and running by showing you how to create and execute your first job in ADF. There are updated and new recipes throughout the book based on developments happening in Azure Synapse, Deployment with Azure DevOps, and Azure Purview. The current edition also runs you through Fabric Data Factory, Data Explorer, and some industry-grade best practices with specific chapters on each.You’ll learn how to branch and chain activities, create custom activities, and schedule pipelines, as well as discover the benefits of cloud data warehousing, Azure Synapse Analytics, and Azure Data Lake Gen2 Storage. With practical recipes, you’ll learn how to actively engage with analytical tools from Azure Data Services and leverage your on-premises infrastructure with cloud-native tools to get relevant business insights. You'll familiarize yourself with the common errors that you may encounter while working with ADF and find out the solutions to them. You’ll also understand error messages and resolve problems in connectors and data flows with the debugging capabilities of ADF.By the end of this book, you’ll be able to use ADF with its latest advancements as the main ETL and orchestration tool for your data warehouse projects.
Andreas Botsikas , Michael Hlobil
The Azure Data Scientist Associate Certification Guide helps you acquire practical knowledge for machine learning experimentation on Azure. It covers everything you need to pass the DP-100 exam and become a certified Azure Data Scientist Associate.Starting with an introduction to data science, you'll learn the terminology that will be used throughout the book and then move on to the Azure Machine Learning (Azure ML) workspace. You'll discover the studio interface and manage various components, such as data stores and compute clusters.Next, the book focuses on no-code and low-code experimentation, and shows you how to use the Automated ML wizard to locate and deploy optimal models for your dataset. You'll also learn how to run end-to-end data science experiments using the designer provided in Azure ML Studio.You'll then explore the Azure ML Software Development Kit (SDK) for Python and advance to creating experiments and publishing models using code. The book also guides you in optimizing your model's hyperparameters using Hyperdrive before demonstrating how to use responsible AI tools to interpret and debug your models. Once you have a trained model, you'll learn to operationalize it for batch or real-time inferences and monitor it in production.By the end of this Azure certification study guide, you'll have gained the knowledge and the practical skills required to pass the DP-100 exam.
Badanie danych. Raport z pierwszej linii działań
Rachel Schutt, Cathy O'Neil
Unikalne wprowadzenie do nauki o danych! W dzisiejszych czasach najcenniejszym dobrem jest informacja. Ogromne ilości danych są przechowywane w przepastnych bazach danych, a kluczem do sukcesu jest ich umiejętna analiza i wyciąganie wniosków. To dynamicznie rozwijająca się dziedzina wiedzy, w której do tej pory brakowało solidnych podręczników, pozwalających na dogłębne poznanie tego obszaru. Na szczęście to się zmieniło! To unikalna książka, w której badacze z największych firm branży IT dzielą się skutecznymi technikami analizy danych. Z kolejnych rozdziałów dowiesz się, czym jest nauka o danych, model danych oraz test A/B. Ponadto zdobędziesz wiedzę na temat wnioskowania statystycznego, algorytmów, języka R oraz wizualizacji danych. Sięgnij po tę książkę, jeżeli chcesz się dowiedzieć, jak wykrywać oszustwa, korzystać z MapReduce oraz badać przyczynowość. To obowiązkowa pozycja na półce czytelników zainteresowanych badaniem danych. Wśród tematów poruszonych w książce odnajdziesz: Wnioskowanie statystyczne, eksploracyjną analizę danych i proces (metodologię) nauki o danych Algorytmy Filtry spamu, naiwny algorytm Bayesa i wstępną obróbkę danych Regresję logistyczną Modelowanie finansowe Mechanizmy rekomendacji i przyczynowość Wizualizowanie danych Sieci społecznościowe i dziennikarstwo danych Inżynierię danych, systemy MapReduce, Pregel i Hadoop Wyciągnij wartościowe wnioski z posiadanych informacji!
Bayesian Analysis with Python. A practical guide to probabilistic modeling - Third Edition
Osvaldo Martin, Christopher Fonnesbeck, Thomas Wiecki
The third edition of Bayesian Analysis with Python serves as an introduction to the main concepts of applied Bayesian modeling using PyMC, a state-of-the-art probabilistic programming library, and other libraries that support and facilitate modeling like ArviZ, for exploratory analysis of Bayesian models; Bambi, for flexible and easy hierarchical linear modeling; PreliZ, for prior elicitation; PyMC-BART, for flexible non-parametric regression; and Kulprit, for variable selection.In this updated edition, a brief and conceptual introduction to probability theory enhances your learning journey by introducing new topics like Bayesian additive regression trees (BART), featuring updated examples. Refined explanations, informed by feedback and experience from previous editions, underscore the book's emphasis on Bayesian statistics. You will explore various models, including hierarchical models, generalized linear models for regression and classification, mixture models, Gaussian processes, and BART, using synthetic and real datasets.By the end of this book, you’ll understand probabilistic modeling and be able to design and implement Bayesian models for data science, with a strong foundation for more advanced study.*Email sign-up and proof of purchase required
Osvaldo Martin, Eric Ma, Austin Rochford
The second edition of Bayesian Analysis with Python is an introduction to the main concepts of applied Bayesian inference and its practical implementation in Python using PyMC3, a state-of-the-art probabilistic programming library, and ArviZ, a new library for exploratory analysis of Bayesian models.The main concepts of Bayesian statistics are covered using a practical and computational approach. Synthetic and real data sets are used to introduce several types of models, such as generalized linear models for regression and classification, mixture models, hierarchical models, and Gaussian processes, among others. By the end of the book, you will have a working knowledge of probabilistic modeling and you will be able to design and implement Bayesian models for your own data science problems. After reading the book you will be better prepared to delve into more advanced material or specialized statistical modeling if you need to.
Bayesian Analysis with Python. Unleash the power and flexibility of the Bayesian framework
Osvaldo Martin
The purpose of this book is to teach the main concepts of Bayesian data analysis. We will learn how to effectively use PyMC3, a Python library for probabilistic programming, to perform Bayesian parameter estimation, to check models and validate them. This book begins presenting the key concepts of the Bayesian framework and the main advantages of this approach from a practical point of view. Moving on, we will explore the power and flexibility of generalized linear models and how to adapt them to a wide array of problems, including regression and classification. We will also look into mixture models and clustering data, and we will finish with advanced topics like non-parametrics models and Gaussian processes. With the help of Python and PyMC3 you will learn to implement, check and expand Bayesian models to solve data analysis problems.
Alvaro Fuentes
Python is one of the most common and popular languages preferred by leading data analysts and statisticians for working with massive datasets and complex data visualizations.Become a Python Data Analyst introduces Python’s most essential tools and libraries necessary to work with the data analysis process, right from preparing data to performing simple statistical analyses and creating meaningful data visualizations.In this book, we will cover Python libraries such as NumPy, pandas, matplotlib, seaborn, SciPy, and scikit-learn, and apply them in practical data analysis and statistics examples. As you make your way through the chapters, you will learn to efficiently use the Jupyter Notebook to operate and manipulate data using NumPy and the pandas library. In the concluding chapters, you will gain experience in building simple predictive models and carrying out statistical computation and analysis using rich Python tools and proven data analysis techniques.By the end of this book, you will have hands-on experience performing data analysis with Python.
Becoming a Data Analyst. A Beginner's Guide to Kickstarting Your Data Analysis Career
Remsey Mailjard, Maaike van Putten
This guide is designed to take you from novice to confident data analyst. Starting with the fundamentals of data analytics, you will explore what data analysis entails and why it's crucial in today's data-driven industries. You'll develop a data analyst mindset, honing your problem-solving and critical-thinking skills through practical exercises. You'll be introduced to different types of data, data sources and key concepts like KPIs and data warehouses. Hands-on chapters will guide you through Excel for basic data analysis, teaching you vital functions, pivot tables, and visualization techniques. You'll dive into SQL to query and manipulate data as well as data cleaning and exploration to prepare datasets for meaningful analysis. More advanced chapters will introduce you to Power BI, so you can build interactive dashboards and use DAX for advanced calculations. You'll work on major projects that will form a professional portfolio showcasing your skills in sales analysis, HR analytics, and customer insights. Finally, the book will teach you the art of communicating your findings through data storytelling to different audiences. You'll also find guidance on continuing education and career growth, ensuring you're well-prepared to launch a successful career in data analytics.
Jorge Brasil
In this book, you'll embark on a comprehensive journey through the fundamentals of linear algebra, a critical component for any aspiring machine learning expert. Starting with an introductory overview, the course explains why linear algebra is indispensable for machine learning, setting the stage for deeper exploration. You'll then dive into the concepts of vectors and matrices, understanding their definitions, properties, and practical applications in the field.As you progress, the course takes a closer look at matrix decomposition, breaking down complex matrices into simpler, more manageable forms. This section emphasizes the importance of decomposition techniques in simplifying computations and enhancing data analysis. The final chapter focuses on principal component analysis, a powerful technique for dimensionality reduction that is widely used in machine learning and data science. By the end of the course, you will have a solid grasp of how PCA can be applied to streamline data and improve model performance.This course is designed to provide technical professionals with a thorough understanding of linear algebra's role in machine learning. By the end, you'll be well-equipped with the knowledge and skills needed to apply linear algebra in practical machine learning scenarios.
Jorge Brasil
This book takes readers on a structured journey through calculus fundamentals essential for AI. Starting with “Why Calculus?” it introduces key concepts like functions, limits, and derivatives, providing a solid foundation for understanding machine learning.As readers progress, they will encounter practical applications such as Taylor Series for curve fitting, gradient descent for optimization, and L'Hôpital’s Rule for managing undefined expressions. Each chapter builds up from core calculus to multidimensional topics, making complex ideas accessible and applicable to AI.The final chapters guide readers through multivariable calculus, including advanced concepts like the gradient, Hessian, and backpropagation, crucial for neural networks. From optimizing models to understanding cost functions, this book equips readers with the calculus skills needed to confidently tackle AI challenges, offering insights that make complex calculus both manageable and deeply relevant to machine learning.
Jorge Brasil
Delve into the importance of probability and statistics in AI, beginning with fundamental measures like mean, median, and variance. This book takes you on a journey through the basics of probability theory, introducing key concepts such as central tendency, variance, and probability distributions. It emphasizes the role of statistical measures in understanding and analyzing data.Building on these foundations, the book explores hypothesis testing, Bayesian inference, and statistical distributions in-depth. Readers will gain practical insights into essential techniques for model evaluation, maximum likelihood estimation, and the interpretation of data in the context of AI applications. Each concept is illustrated with practical examples and case studies to ensure clarity and application.Finally, advanced topics like Markov processes, hierarchical Bayesian models, and multivariate distributions are introduced. The book addresses critical areas like variance, correlation, and hypothesis testing, equipping readers with the skills to tackle real-world challenges in AI and machine learning. Whether you're a student, professional, or AI enthusiast, this book offers the essential statistical tools and knowledge to excel in the field.
Alex Galea
Get to grips with the skills you need for entry-level data science in this hands-on Python and Jupyter course. You'll learn about some of the most commonly used libraries that are part of the Anaconda distribution, and then explore machine learning models with real datasets to give you the skills and exposure you need for the real world. We'll finish up by showing you how easy it can be to scrape and gather your own data from the open web, so that you can apply your new skills in an actionable context.
Beginning Swift. Master the fundamentals of programming in Swift 4
Rob Kerr, Kare Morstol
Take your first foray into programming for Apple devices with Swift.Swift is fundamentally different from Objective-C, as it is a protocol-oriented language. While you can still write normal object-oriented code in Swift, it requires a new way of thinking to take advantage of its powerful features and a solid understanding of the basics to become productive.
Sridhar Alla
Apache Hadoop is the most popular platform for big data processing, and can be combined with a host of other big data tools to build powerful analytics solutions. Big Data Analytics with Hadoop 3 shows you how to do just that, by providing insights into the software as well as its benefits with the help of practical examples.Once you have taken a tour of Hadoop 3’s latest features, you will get an overview of HDFS, MapReduce, and YARN, and how they enable faster, more efficient big data processing. You will then move on to learning how to integrate Hadoop with the open source tools, such as Python and R, to analyze and visualize data and perform statistical computing on big data. As you get acquainted with all this, you will explore how to use Hadoop 3 with Apache Spark and Apache Flink for real-time data analytics and stream processing. In addition to this, you will understand how to use Hadoop to build analytics solutions on the cloud and an end-to-end pipeline to perform big data analysis using practical use cases.By the end of this book, you will be well-versed with the analytical capabilities of the Hadoop ecosystem. You will be able to build powerful solutions to perform big data analytics and get insight effortlessly.
Big Data Analytics with Java. Data analysis, visualization & machine learning techniques
RAJAT MEHTA
This book covers case studies such as sentiment analysis on a tweet dataset, recommendations on a movielens dataset, customer segmentation on an ecommerce dataset, and graph analysis on actual flights dataset.This book is an end-to-end guide to implement analytics on big data withJava. Java is the de facto language for major big data environments, including Hadoop. This book will teach you how to perform analytics on big data with production-friendly Java. This book basically divided into twosections. The first part is an introduction that will help the readers get acquainted with big data environments, whereas the second part will contain a hardcore discussion on all the concepts in analyticson big data. It will take you from data analysis and data visualization to the core concepts and advantages of machine learning, real-life usage of regression and classification using Naïve Bayes, a deep discussion on the concepts of clustering,and a review of simple neural networkson big data using deepLearning4j or plain Java Spark code. This book is a must-have book for Java developers who want to start learning big data analytics and want to use it in the real world.
Big Data Analytics with R. Leverage R Programming to uncover hidden patterns in your Big Data
Simon Walkowiak
Big Data analytics is the process of examining large and complex data sets that often exceed the computational capabilities. R is a leading programming language of data science, consisting of powerful functions to tackle all problems related to Big Data processing.The book will begin with a brief introduction to the Big Data world and its current industry standards. With introduction to the R language and presenting its development, structure, applications in real world, and its shortcomings. Book will progress towards revision of major R functions for data management and transformations. Readers will be introduce to Cloud based Big Data solutions (e.g. Amazon EC2 instances and Amazon RDS, Microsoft Azure and its HDInsight clusters) and also provide guidance on R connectivity with relational and non-relational databases such as MongoDB and HBase etc. It will further expand to include Big Data tools such as Apache Hadoop ecosystem, HDFS and MapReduce frameworks. Also other R compatible tools such as Apache Spark, its machine learning library Spark MLlib, as well as H2O.
Big Data Analytics with SAS. Get actionable insights from your Big Data using the power of SAS
David Pope, Subhashini S Tripathi
SAS has been recognized by Money Magazine and Payscale as one of the top business skills to learn in order to advance one’s career. Through innovative data management, analytics, and business intelligence software and services, SAS helps customers solve their business problems by allowing them to make better decisions faster. This book introduces the reader to the SAS and how they can use SAS to perform efficient analysis on any size data, including Big Data.The reader will learn how to prepare data for analysis, perform predictive, forecasting, and optimization analysis and then deploy or report on the results of these analyses. While performing the coding examples within this book the reader will learn how to use the web browser based SAS Studio and iPython Jupyter Notebook interfaces for working with SAS. Finally, the reader will learn how SAS’s architecture is engineered and designed to scale up and/or out and be combined with the open source offerings such as Hadoop, Python, and R. By the end of this book, you will be able to clearly understand how you can efficiently analyze Big Data using SAS.
Syed Muhammad Fahad Akhtar
The big data architects are the “masters” of data, and hold high value in today’s market. Handling big data, be it of good or bad quality, is not an easy task. The prime job for any big data architect is to build an end-to-end big data solution that integrates data from different sources and analyzes it to find useful, hidden insights.Big Data Architect’s Handbook takes you through developing a complete, end-to-end big data pipeline, which will lay the foundation for you and provide the necessary knowledge required to be an architect in big data. Right from understanding the design considerations to implementing a solid, efficient, and scalable data pipeline, this book walks you through all the essential aspects of big data. It also gives you an overview of how you can leverage the power of various big data tools such as Apache Hadoop and ElasticSearch in order to bring them together and build an efficient big data solution.By the end of this book, you will be able to build your own design system which integrates, maintains, visualizes, and monitors your data. In addition, you will have a smooth design flow in each process, putting insights in action.
Big Data. Krótkie Wprowadzenie 30
Dawn E. Holmes
KRÓTKIE WPROWADZENIE - książki, które zmieniają sposób myślenia! Big data pokazuje, jak postęp technologiczny spowodowany rozwojem Internetu i cyfrowego wszechświata wpłynął na radykalną transformację nauki o danych. Czym są duże zbiory danych i jak zmieniają świat? Jaki mają wpływ na nasze codzienne życie, a jaki na świat biznesu? W tej książce czytelnik znajdzie odpowiedzi na te pytania. * Interdyscyplinarna seria KRÓTKIE WPROWADZENIE piórem uznanych ekspertów skupionych wokół Uniwersytetu Oksfordzkiego przybliża aktualną wiedzę na temat współczesnego świata i pomaga go zrozumieć. W atrakcyjny sposób prezentuje najważniejsze zagadnienia XXI w. - od kultury, religii, historii przez nauki przyrodnicze po technikę. To publikacje popularnonaukowe, które w formule przystępnej, dalekiej od akademickiego wykładu, prezentują wybrane kwestie. Książki idealne zarówno jako wprowadzenie do nowych tematów, jak i uzupełnienie wiedzy o tym, co nas pasjonuje. Najnowsze fakty, analizy ekspertów, błyskotliwe interpretacje. Opiekę merytoryczną nad polską edycją serii sprawują naukowcy z Uniwersytetu Łódzkiego: prof. Krystyna Kujawińska Courtney, prof. Ewa Gajewska, prof. Aneta Pawłowska, prof. Jerzy Gajdka, prof. Piotr Stalmaszczyk.