Big data - Ebooki - Biblioteka BIBLIO | BIBLIO ebookpoint

801

EBOOK

Practical Data Analysis. Pandas, MongoDB, Apache Spark, and more - Second Edition

Hector Cuesta, Dr. Sampath Kumar

Beyond buzzwords like Big Data or Data Science, there are a great opportunities to innovate in many businesses using data analysis to get data-driven products. Data analysis involves asking many questions about data in order to discover insights and generate value for a product or a service.This book explains the basic data algorithms without the theoretical jargon, and you’ll get hands-on turning data into insights using machine learning techniques. We will perform data-driven innovation processing for several types of data such as text, Images, social network graphs, documents, and time series, showing you how to implement large data processing with MongoDB and Apache Spark.

802

EBOOK

Practical Data Analysis Using Jupyter Notebook. Learn how to speak the language of data by extracting useful and actionable insights using Python

Marc Wintjen

Data literacy is the ability to read, analyze, work with, and argue using data. Data analysis is the process of cleaning and modeling your data to discover useful information. This book combines these two concepts by sharing proven techniques and hands-on examples so that you can learn how to communicate effectively using data.After introducing you to the basics of data analysis using Jupyter Notebook and Python, the book will take you through the fundamentals of data. Packed with practical examples, this guide will teach you how to clean, wrangle, analyze, and visualize data to gain useful insights, and you'll discover how to answer questions using data with easy-to-follow steps.Later chapters teach you about storytelling with data using charts, such as histograms and scatter plots. As you advance, you'll understand how to work with unstructured data using natural language processing (NLP) techniques to perform sentiment analysis. All the knowledge you gain will help you discover key patterns and trends in data using real-world examples. In addition to this, you will learn how to handle data of varying complexity to perform efficient data analysis using modern Python libraries.By the end of this book, you'll have gained the practical skills you need to analyze data with confidence.

803

EBOOK

Practical Data Quality. Learn practical, real-world strategies to transform the quality of data in your organization

Robert Hawker

Poor data quality can lead to increased costs, hinder revenue growth, compromise decision-making, and introduce risk into organizations. This leads to employees, customers, and suppliers finding every interaction with the organization frustrating.Practical Data Quality provides a comprehensive view of managing data quality within your organization, covering everything from business cases through to embedding improvements that you make to the organization permanently. Each chapter explains a key element of data quality management, from linking strategy and data together to profiling and designing business rules which reveal bad data. The book outlines a suite of tried-and-tested reports that highlight bad data and allow you to develop a plan to make corrections. Throughout the book, you’ll work with real-world examples and utilize re-usable templates to accelerate your initiatives.By the end of this book, you’ll have gained a clear understanding of every stage of a data quality initiative and be able to drive tangible results for your organization at pace.

804

EBOOK

Practical Data Science Cookbook, Second Edition. Data pre-processing, analysis and visualization using R and Python - Second Edition

RATNADIP ADHIKARI, Rajib Bhattacharya, Prabhanjan Narayanachar Tattar,...

As increasing amounts of data are generated each year, the need to analyze and create value out of it is more important than ever. Companies that know what to do with their data and how to do it well will have a competitive advantage over companies that don’t. Because of this, there will be an increasing demand for people that possess both the analytical and technical abilities to extract valuable insights from data and create valuable solutions that put those insights to use. Starting with the basics, this book covers how to set up your numerical programming environment, introduces you to the data science pipeline, and guides you through several data projects in a step-by-step format. By sequentially working through the steps in each chapter, you will quickly familiarize yourself with the process and learn how to apply it to a variety of situations with examples using the two most popular programming languages for data analysis—R and Python.

805

EBOOK

Practical Data Wrangling. Expert techniques for transforming your raw data into a valuable source for analytics

Allan Visochek

Around 80% of time in data analysis is spent on cleaning and preparing data for analysis. This is, however, an important task, and is a prerequisite to the rest of the data analysis workflow, including visualization, analysis and reporting. Python and R are considered a popular choice of tool for data analysis, and have packages that can be best used to manipulate different kinds of data, as per your requirements. This book will show you the different data wrangling techniques, and how you can leverage the power of Python and R packages to implement them.You’ll start by understanding the data wrangling process and get a solid foundation to work with different types of data. You’ll work with different data structures and acquire and parse data from various locations. You’ll also see how to reshape the layout of data and manipulate, summarize, and join data sets. Finally, we conclude with a quick primer on accessing and processing data from databases, conducting data exploration, and storing and retrieving data quickly using databases.The book includes practical examples on each of these points using simple and real-world data sets to give you an easier understanding. By the end of the book, you’ll have a thorough understanding of all the data wrangling concepts and how to implement them in the best possible way.

806

EBOOK

Practical Deep Learning at Scale with MLflow. Bridge the gap between offline experimentation and online production

Yong Liu

The book starts with an overview of the deep learning (DL) life cycle and the emerging Machine Learning Ops (MLOps) field, providing a clear picture of the four pillars of deep learning: data, model, code, and explainability and the role of MLflow in these areas.From there onward, it guides you step by step in understanding the concept of MLflow experiments and usage patterns, using MLflow as a unified framework to track DL data, code and pipelines, models, parameters, and metrics at scale. You’ll also tackle running DL pipelines in a distributed execution environment with reproducibility and provenance tracking, and tuning DL models through hyperparameter optimization (HPO) with Ray Tune, Optuna, and HyperBand. As you progress, you’ll learn how to build a multi-step DL inference pipeline with preprocessing and postprocessing steps, deploy a DL inference pipeline for production using Ray Serve and AWS SageMaker, and finally create a DL explanation as a service (EaaS) using the popular Shapley Additive Explanations (SHAP) toolbox.By the end of this book, you’ll have built the foundation and gained the hands-on experience you need to develop a DL pipeline solution from initial offline experimentation to final deployment and production, all within a reproducible and open source framework.

807

EBOOK

Practical GIS. Learn novice to advanced topics such as QGIS, Spatial data analysis, and more

Gábor Farkas

The most commonly used GIS tools automate tasks that were historically done manually—compiling new maps by overlaying one on top of the other or physically cutting maps into pieces representing specific study areas, changing their projection, and getting meaningful results from the various layers by applying mathematical functions and operations. This book is an easy-to-follow guide to use the most matured open source GIS tools for these tasks.We’ll start by setting up the environment for the tools we use in the book. Then you will learn how to work with QGIS in order to generate useful spatial data. You will get to know the basics of queries, data management, and geoprocessing.After that, you will start to practice your knowledge on real-world examples. We will solve various types of geospatial analyses with various methods. We will start with basic GIS problems by imitating the work of an enthusiastic real estate agent, and continue with more advanced, but typical tasks by solving a decision problem. Finally, you will find out how to publish your data (and results) on the web. We will publish our data with QGIS Server and GeoServer, and create a basic web map with the API of the lightweight Leaflet web mapping library.

808

EBOOK

Practical Guide to Applied Conformal Prediction in Python. Learn and apply the best uncertainty frameworks to your industry applications

Valery Manokhin, Agus Sudjianto

In the rapidly evolving landscape of machine learning, the ability to accurately quantify uncertainty is pivotal. The book addresses this need by offering an in-depth exploration of Conformal Prediction, a cutting-edge framework to manage uncertainty in various ML applications.Learn how Conformal Prediction excels in calibrating classification models, produces well-calibrated prediction intervals for regression, and resolves challenges in time series forecasting and imbalanced data. Discover specialised applications of conformal prediction in cutting-edge domains like computer vision and NLP. Each chapter delves into specific aspects, offering hands-on insights and best practices for enhancing prediction reliability. The book concludes with a focus on multi-class classification nuances, providing expert-level proficiency to seamlessly integrate Conformal Prediction into diverse industries. With practical examples in Python using real-world datasets, expert insights, and open-source library applications, you will gain a solid understanding of this modern framework for uncertainty quantification.By the end of this book, you will be able to master Conformal Prediction in Python with a blend of theory and practical application, enabling you to confidently apply this powerful framework to quantify uncertainty in diverse fields.

809

EBOOK

Practical Machine Learning Cookbook. Supervised and unsupervised machine learning simplified

Vikram Chandra Jha, Atul Tripathi

Machine learning has become the new black. The challenge in today’s world is the explosion of data from existing legacy data and incoming new structured and unstructured data. The complexity of discovering, understanding, performing analysis, and predicting outcomes on the data using machine learning algorithms is a challenge. This cookbook will help solve everyday challenges you face as a data scientist. The application of various data science techniques and on multiple data sets based on real-world challenges you face will help you appreciate a variety of techniques used in various situations.The first half of the book provides recipes on fairly complex machine-learning systems, where you’ll learn to explore new areas of applications of machine learning and improve its efficiency. That includes recipes on classifications, neural networks, unsupervised and supervised learning, deep learning, reinforcement learning, and more.The second half of the book focuses on three different machine learning case studies, all based on real-world data, and offers solutions and solves specific machine-learning issues in each one.

810

EBOOK

Practical Machine Learning with R. Define, build, and evaluate machine learning models for real-world applications

Brindha Priyadarshini Jeyaraman, Ludvig Renbo Olsen, Monicah...

With huge amounts of data being generated every moment, businesses need applications that apply complex mathematical calculations to data repeatedly and at speed. With machine learning techniques and R, you can easily develop these kinds of applications in an efficient way.Practical Machine Learning with R begins by helping you grasp the basics of machine learning methods, while also highlighting how and why they work. You will understand how to get these algorithms to work in practice, rather than focusing on mathematical derivations. As you progress from one chapter to another, you will gain hands-on experience of building a machine learning solution in R. Next, using R packages such as rpart, random forest, and multiple imputation by chained equations (MICE), you will learn to implement algorithms including neural net classifier, decision trees, and linear and non-linear regression. As you progress through the book, you’ll delve into various machine learning techniques for both supervised and unsupervised learning approaches. In addition to this, you’ll gain insights into partitioning the datasets and mechanisms to evaluate the results from each model and be able to compare them. By the end of this book, you will have gained expertise in solving your business problems, starting by forming a good problem statement, selecting the most appropriate model to solve your problem, and then ensuring that you do not overtrain it.

811

EBOOK

Practical MongoDB Aggregations. The official guide to developing optimal aggregation pipelines with MongoDB 7.0

Paul Done

Officially endorsed by MongoDB, Inc., Practical MongoDB Aggregations helps you unlock the full potential of the MongoDB aggregation framework, including the latest features of MongoDB 7.0. This book provides practical, easy-to-digest principles and approaches for increasing your effectiveness in developing aggregation pipelines, supported by examples for building pipelines to solve complex data manipulation and analytical tasks.This book is customized for developers, architects, data analysts, data engineers, and data scientists with some familiarity with the aggregation framework. It begins by explaining the framework's architecture and then shows you how to build pipelines optimized for productivity and scale.Given the critical role arrays play in MongoDB's document model, the book delves into best practices for optimally manipulating arrays. The latter part of the book equips you with examples to solve common data processing challenges so you can apply the lessons you've learned to practical situations. By the end of this MongoDB book, you’ll have learned how to utilize the MongoDB aggregation framework to streamline your data analysis and manipulation processes effectively.

812

EBOOK

Practical Predictive Analytics. Analyse current and historical data to predict future trends using R, Spark, and more

Ralph Winters

This is the go-to book for anyone interested in the steps needed to develop predictive analytics solutions with examples from the world of marketing, healthcare, and retail. We'll get startedwith a brief history of predictive analytics and learn about different roles and functions people play within a predictive analytics project. Then, we will learn about various ways of installing R along with their pros and cons, combined with a step-by-step installation of RStudio,and a description of the best practices for organizing your projects.On completing the installation, we will begin to acquire the skills necessary to input, clean, and prepare your data for modeling. We will learn the six specific steps needed to implement andsuccessfully deploy a predictive model starting from asking the right questions through model development and ending with deploying your predictive model into production. We will learn whycollaboration is important and how agile iterative modeling cycles can increase your chances of developing and deploying the best successful model.We will continue your journey in the cloud by extending your skill set by learning about Databricks and SparkR, which allow you to develop predictive models on vast gigabytes of data.

813

EBOOK

Practical Real-time Data Processing and Analytics. Distributed Computing and Event Processing using Apache Spark, Flink, Storm, and Kafka

Shilpi Saxena, Selva raj Ramasamy, Prateek Bhati,...

With the rise of Big Data, there is an increasing need to process large amounts of data continuously, with a shorter turnaround time. Real-time data processing involves continuous input, processing and output of data, with the condition that the time required for processing is as short as possible.This book covers the majority of the existing and evolving open source technology stack for real-time processing and analytics. You will get to know about all the real-time solution aspects, from the source to the presentation to persistence. Through this practical book, you’ll be equipped with a clear understanding of how to solve challenges on your own.We’ll cover topics such as how to set up components, basic executions, integrations, advanced use cases, alerts, and monitoring. You’ll be exposed to the popular tools used in real-time processing today such as Apache Spark, Apache Flink, and Storm. Finally, you will put your knowledge to practical use by implementing all of the techniques in the form of a practical, real-world use case.By the end of this book, you will have a solid understanding of all the aspects of real-time data processing and analytics, and will know how to deploy the solutions in production environments in the best possible manner.

814

EBOOK

Practical Site Reliability Engineering. Automate the process of designing, developing, and delivering highly reliable apps and services with SRE

Pethuru Raj Chelliah, Shreyash Naithani, Shailender Singh

Site reliability engineering (SRE) is being touted as the most competent paradigm in establishing and ensuring next-generation high-quality software solutions.This book starts by introducing you to the SRE paradigm and covers the need for highly reliable IT platforms and infrastructures. As you make your way through the next set of chapters, you will learn to develop microservices using Spring Boot and make use of RESTful frameworks. You will also learn about GitHub for deployment, containerization, and Docker containers. Practical Site Reliability Engineering teaches you to set up and sustain containerized cloud environments, and also covers architectural and design patterns and reliability implementation techniques such as reactive programming, and languages such as Ballerina and Rust. In the concluding chapters, you will get well-versed with service mesh solutions such as Istio and Linkerd, and understand service resilience test practices, API gateways, and edge/fog computing.By the end of this book, you will have gained experience on working with SRE concepts and be able to deliver highly reliable apps and services.

815

EBOOK

Practical Threat Intelligence and Data-Driven Threat Hunting. A hands-on guide to threat hunting with the ATT&CK™ Framework and open source tools

Valentina Costa-Gazcón

Threat hunting (TH) provides cybersecurity analysts and enterprises with the opportunity to proactively defend themselves by getting ahead of threats before they can cause major damage to their business.This book is not only an introduction for those who don’t know much about the cyber threat intelligence (CTI) and TH world, but also a guide for those with more advanced knowledge of other cybersecurity fields who are looking to implement a TH program from scratch.You will start by exploring what threat intelligence is and how it can be used to detect and prevent cyber threats. As you progress, you’ll learn how to collect data, along with understanding it by developing data models. The book will also show you how to set up an environment for TH using open source tools. Later, you will focus on how to plan a hunt with practical examples, before going on to explore the MITRE ATT&CK framework.By the end of this book, you’ll have the skills you need to be able to carry out effective hunts in your own environment.

816

EBOOK

Practical Time Series Analysis. Master Time Series Data Processing, Visualization, and Modeling using Python

PKS Prakash, Avishek Pal

Time Series Analysis allows us to analyze data which is generated over a period of time and has sequential interdependencies between the observations. This book describes special mathematical tricks and techniques which are geared towards exploring the internal structures of time series data and generating powerful descriptive and predictive insights. Also, the book is full of real-life examples of time series and their analyses using cutting-edge solutions developed in Python. The book starts with descriptive analysis to create insightful visualizations of internal structures such as trend, seasonality, and autocorrelation. Next, the statistical methods of dealing with autocorrelation and non-stationary time series are described. This is followed by exponential smoothing to produce meaningful insights from noisy time series data. At this point, we shift focus towards predictive analysis and introduce autoregressive models such as ARMA and ARIMA for time series forecasting. Later, powerful deep learning methods are presented, to develop accurate forecasting models for complex time series, and under the availability of little domain knowledge. All the topics are illustrated with real-life problem scenarios and their solutions by best-practice implementations in Python.The book concludes with the Appendix, with a brief discussion of programming and solving data science problems using Python.