Big data

937
Завантаження...
EЛЕКТРОННА КНИГА

scikit-learn Cookbook. Over 80 recipes for machine learning in Python with scikit-learn - Second Edition

Julian Avila, Trent Hauck

Python is quickly becoming the go-to language for analysts and data scientists due to its simplicity and flexibility, and within the Python data space, scikit-learn is the unequivocal choice for machine learning. This book includes walk throughs and solutions to the common as well as the not-so-common problems in machine learning, and how scikit-learn can be leveraged to perform various machine learning tasks effectively.The second edition begins with taking you through recipes on evaluating the statistical properties of data and generates synthetic data for machine learning modelling. As you progress through the chapters, you will comes across recipes that will teach you to implement techniques like data pre-processing, linear regression, logistic regression, K-NN, Naïve Bayes, classification, decision trees, Ensembles and much more. Furthermore, you’ll learn to optimize your models with multi-class classification, cross validation, model evaluation and dive deeper in to implementing deep learning with scikit-learn. Along with covering the enhanced features on model section, API and new features like classifiers, regressors and estimators the book also contains recipes on evaluating and fine-tuning the performance of your model. By the end of this book, you will have explored plethora of features offered by scikit-learn for Python to solve any machine learning problem you come across.

938
Завантаження...
EЛЕКТРОННА КНИГА

scikit-learn Cookbook. Over 80 recipes for machine learning in Python with scikit-learn - Third Edition

John Sukup

Trusted by data scientists, ML engineers, and software developers alike, scikit-learn offers a versatile, user-friendly framework for implementing a wide range of ML algorithms, enabling the efficient development and deployment of predictive models in real-world applications. This third edition of scikit-learn Cookbook will help you master ML with real-world examples and scikit-learn 1.5 features.This updated edition takes you on a journey from understanding the fundamentals of ML and data preprocessing, through implementing advanced algorithms and techniques, to deploying and optimizing ML models in production. Along the way, you’ll explore practical, step-by-step recipes that cover everything from feature engineering and model selection to hyperparameter tuning and model evaluation, all using scikit-learn.By the end of this book, you’ll have gained the knowledge and skills needed to confidently build, evaluate, and deploy sophisticated ML models using scikit-learn, ready to tackle a wide range of data-driven challenges.*Email sign-up and proof of purchase required

939
Завантаження...
EЛЕКТРОННА КНИГА

SciPy Recipes. A cookbook with over 110 proven recipes for performing mathematical and scientific computations

Luiz Felipe Martins, V Kishore Ayyadevara, Ruben...

With the SciPy Stack, you get the power to effectively process, manipulate, and visualize your data using the popular Python language. Utilizing SciPy correctly can sometimes be a very tricky proposition. This book provides the right techniques so you can use SciPy to perform different data science tasks with ease.This book includes hands-on recipes for using the different components of the SciPy Stack such as NumPy, SciPy, matplotlib, and pandas, among others. You will use these libraries to solve real-world problems in linear algebra, numerical analysis, data visualization, and much more. The recipes included in the book will ensure you get a practical understanding not only of how a particular feature in SciPy Stack works, but also of its application to real-world problems. The independent nature of the recipes also ensure that you can pick up any one and learn about a particular feature of SciPy without reading through the other recipes, thus making the book a very handy and useful guide.

940
Завантаження...
EЛЕКТРОННА КНИГА
941
Завантаження...
EЛЕКТРОННА КНИГА

Serverless ETL and Analytics with AWS Glue. Your comprehensive reference guide to learning about AWS Glue and its features

Vishal Pathak, Subramanya Vajiraya, Noritaka Sekiyama, Tomohiro...

Organizations these days have gravitated toward services such as AWS Glue that undertake undifferentiated heavy lifting and provide serverless Spark, enabling you to create and manage data lakes in a serverless fashion. This guide shows you how AWS Glue can be used to solve real-world problems along with helping you learn about data processing, data integration, and building data lakes.Beginning with AWS Glue basics, this book teaches you how to perform various aspects of data analysis such as ad hoc queries, data visualization, and real-time analysis using this service. It also provides a walk-through of CI/CD for AWS Glue and how to shift left on quality using automated regression tests. You’ll find out how data security aspects such as access control, encryption, auditing, and networking are implemented, as well as getting to grips with useful techniques such as picking the right file format, compression, partitioning, and bucketing. As you advance, you’ll discover AWS Glue features such as crawlers, Lake Formation, governed tables, lineage, DataBrew, Glue Studio, and custom connectors. The concluding chapters help you to understand various performance tuning, troubleshooting, and monitoring options.By the end of this AWS book, you’ll be able to create, manage, troubleshoot, and deploy ETL pipelines using AWS Glue.

942
Завантаження...
EЛЕКТРОННА КНИГА

Seven NoSQL Databases in a Week. Get up and running with the fundamentals and functionalities of seven of the most popular NoSQL databases

Aaron Ploetz, Devram Kandhare, Sudarshan Kadambi, Xun...

This is the golden age of open source NoSQL databases. With enterprises having to work with large amounts of unstructured data and moving away from expensive monolithic architecture, the adoption of NoSQL databases is rapidly increasing. Being familiar with the popular NoSQL databases and knowing how to use them is a must for budding DBAs and developers.This book introduces you to the different types of NoSQL databases and gets you started with seven of the most popular NoSQL databases used by enterprises today. We start off with a brief overview of what NoSQL databases are, followed by an explanation of why and when to use them. The book then covers the seven most popular databases in each of these categories: MongoDB, Amazon DynamoDB, Redis, HBase, Cassandra, In?uxDB, and Neo4j. The book doesn't go into too much detail about each database but teachesyou enough to get started with them.By the end of this book, you will have a thorough understanding of the different NoSQL databases and their functionalities, empowering you to select and use the rightdatabase according to your needs.

943
Завантаження...
EЛЕКТРОННА КНИГА

Siatka danych. Nowoczesna koncepcja samoobsługowej infrastruktury danych

Zhamak Dehghani

Dostęp do danych jest warunkiem rozwoju niejednej organizacji. Aby w pełni skorzystać z ich potencjału i uzyskać dzięki nim konkretną wartość, konieczne jest odpowiednie zarządzanie danymi. Obecnie stosowane rozwiązania w tym zakresie nie nadążają już za złożonością dzisiejszych organizacji, rozprzestrzenianiem się źródeł danych i rosnącymi aspiracjami inżynierów, którzy rozwijają techniki sztucznej inteligencji i analizy danych. Odpowiedzią na te potrzeby może być siatka danych (Data Mesh), jednak praktyczna implementacja tej koncepcji wymaga istotnej zmiany myślenia. Ta książka szczegółowo wyjaśnia paradygmat siatki danych, a przy tym koncentruje się na jego praktycznym zastosowaniu. Zgodnie z tym nowatorskim podejściem dane należy traktować jako produkt, a dziedziny - jako główne zagadnienie. Poza wyjaśnieniem paradygmatu opisano tu zasady projektowania wysokopoziomowej architektury komponentów siatki danych, a także przedstawiono wskazówki i porady dotyczące ewolucyjnej realizacji siatki danych w organizacji. Tematyka ta została potraktowana wszechstronnie: omówiono kwestie technologiczne, organizacyjne, jak również socjologiczne i kulturowe. Dzięki temu jest to cenna lektura zarówno dla architektów i inżynierów, jak i dla badaczy, analityków danych, wreszcie dla liderów i kierowników zespołów. W książce: wyczerpujące wprowadzenie do paradygmatu siatki danych siatka danych i jej komponenty projektowanie architektury siatki danych opracowywanie i realizacja strategii siatki danych zdecentralizowany model własności danych przejście z hurtowni i jezior danych do rozproszonej siatki danych Siatka danych: kolejny etap rozwoju technologii big data!

944
Завантаження...
EЛЕКТРОННА КНИГА

Simplify Big Data Analytics with Amazon EMR. A beginner’s guide to learning and implementing Amazon EMR for building data analytics solutions

Sakti Mishra

Amazon EMR, formerly Amazon Elastic MapReduce, provides a managed Hadoop cluster in Amazon Web Services (AWS) that you can use to implement batch or streaming data pipelines. By gaining expertise in Amazon EMR, you can design and implement data analytics pipelines with persistent or transient EMR clusters in AWS.This book is a practical guide to Amazon EMR for building data pipelines. You'll start by understanding the Amazon EMR architecture, cluster nodes, features, and deployment options, along with their pricing. Next, the book covers the various big data applications that EMR supports. You'll then focus on the advanced configuration of EMR applications, hardware, networking, security, troubleshooting, logging, and the different SDKs and APIs it provides. Later chapters will show you how to implement common Amazon EMR use cases, including batch ETL with Spark, real-time streaming with Spark Streaming, and handling UPSERT in S3 Data Lake with Apache Hudi. Finally, you'll orchestrate your EMR jobs and strategize on-premises Hadoop cluster migration to EMR. In addition to this, you'll explore best practices and cost optimization techniques while implementing your data analytics pipeline in EMR.By the end of this book, you'll be able to build and deploy Hadoop- or Spark-based apps on Amazon EMR and also migrate your existing on-premises Hadoop workloads to AWS.