Big data

25
E-book

Bash for Data Scientists. A Comprehensive Guide to Shell Scripting for Data Science Tasks

Mercury Learning and Information, Oswald Campesato

This book introduces powerful command line utilities for creating efficient shell scripts to process datasets. Using the bash shell, the examples and scripts focus on small datasets to help readers understand the features of grep, sed, and awk. Companion files with code are available for download from the publisher.The course starts with an introduction to the basics, covering files and directories, and useful commands. It then progresses to conditional logic and loops, providing a solid foundation for processing datasets. Detailed chapters on using grep, sed, and awk illustrate their capabilities in handling and cleaning various types of datasets effectively.Advanced topics include processing datasets with Pandas, exploring NoSQL, SQLite, and Python. The book equips data scientists, analysts, and anyone seeking shell-based solutions with practical skills. By the end, users will be adept at creating robust scripts for dataset processing, combining command line utilities for optimal results.

26
E-book

Bayesian Analysis with Python. Introduction to statistical modeling and probabilistic programming using PyMC3 and ArviZ - Second Edition

Osvaldo Martin

The second edition of Bayesian Analysis with Python is an introduction to the main concepts of applied Bayesian inference and its practical implementation in Python using PyMC3, a state-of-the-art probabilistic programming library, and ArviZ, a new library for exploratory analysis of Bayesian models.The main concepts of Bayesian statistics are covered using a practical and computational approach. Synthetic and real data sets are used to introduce several types of models, such as generalized linear models for regression and classification, mixture models, hierarchical models, and Gaussian processes, among others. By the end of the book, you will have a working knowledge of probabilistic modeling and you will be able to design and implement Bayesian models for your own data science problems. After reading the book you will be better prepared to delve into more advanced material or specialized statistical modeling if you need to.

27
E-book

Become a Python Data Analyst. Perform exploratory data analysis and gain insight into scientific computing using Python

Alvaro Fuentes

Python is one of the most common and popular languages preferred by leading data analysts and statisticians for working with massive datasets and complex data visualizations.Become a Python Data Analyst introduces Python’s most essential tools and libraries necessary to work with the data analysis process, right from preparing data to performing simple statistical analyses and creating meaningful data visualizations.In this book, we will cover Python libraries such as NumPy, pandas, matplotlib, seaborn, SciPy, and scikit-learn, and apply them in practical data analysis and statistics examples. As you make your way through the chapters, you will learn to efficiently use the Jupyter Notebook to operate and manipulate data using NumPy and the pandas library. In the concluding chapters, you will gain experience in building simple predictive models and carrying out statistical computation and analysis using rich Python tools and proven data analysis techniques.By the end of this book, you will have hands-on experience performing data analysis with Python.

28
E-book

Big Data Analytics with Hadoop 3. Build highly effective analytics solutions to gain valuable insight into your big data

Sridhar Alla

Apache Hadoop is the most popular platform for big data processing, and can be combined with a host of other big data tools to build powerful analytics solutions. Big Data Analytics with Hadoop 3 shows you how to do just that, by providing insights into the software as well as its benefits with the help of practical examples.Once you have taken a tour of Hadoop 3’s latest features, you will get an overview of HDFS, MapReduce, and YARN, and how they enable faster, more efficient big data processing. You will then move on to learning how to integrate Hadoop with the open source tools, such as Python and R, to analyze and visualize data and perform statistical computing on big data. As you get acquainted with all this, you will explore how to use Hadoop 3 with Apache Spark and Apache Flink for real-time data analytics and stream processing. In addition to this, you will understand how to use Hadoop to build analytics solutions on the cloud and an end-to-end pipeline to perform big data analysis using practical use cases.By the end of this book, you will be well-versed with the analytical capabilities of the Hadoop ecosystem. You will be able to build powerful solutions to perform big data analytics and get insight effortlessly.

29
E-book

Big Data Architect's Handbook. A guide to building proficiency in tools and systems used by leading big data experts

Syed Muhammad Fahad Akhtar

The big data architects are the “masters” of data, and hold high value in today’s market. Handling big data, be it of good or bad quality, is not an easy task. The prime job for any big data architect is to build an end-to-end big data solution that integrates data from different sources and analyzes it to find useful, hidden insights.Big Data Architect’s Handbook takes you through developing a complete, end-to-end big data pipeline, which will lay the foundation for you and provide the necessary knowledge required to be an architect in big data. Right from understanding the design considerations to implementing a solid, efficient, and scalable data pipeline, this book walks you through all the essential aspects of big data. It also gives you an overview of how you can leverage the power of various big data tools such as Apache Hadoop and ElasticSearch in order to bring them together and build an efficient big data solution.By the end of this book, you will be able to build your own design system which integrates, maintains, visualizes, and monitors your data. In addition, you will have a smooth design flow in each process, putting insights in action.

30
E-book
31
E-book

Big Data. Najlepsze praktyki budowy skalowalnych systemów obsługi danych w czasie rzeczywistym

Nathan Marz, James Warren

Obsługa aplikacji, które operują na ogromnych zbiorach danych, czyli na przykład portali społecznościowych, przekracza możliwości zwykłych relacyjnych baz. Praca ze złożonymi zbiorami danych wymaga architektury obejmującej wielomaszynowe klastry, dzięki którym możliwe jest przechowywanie i przesyłanie informacji praktycznie dowolnej wielkości. Architektura taka powinna dodatkowo być prosta w użyciu, niezawodna i skalowalna. Dzięki tej książce nauczysz się budować tego rodzaju architekturę. Zapoznasz się z technologią wykorzystywania klastrów maszyn. Dowiesz się, jak działają narzędzia przeznaczone specjalnie do przechwytywania i analizy danych na wielką skalę. W książce zaprezentowano łatwe do zrozumienia podejście do obsługi systemów wielkich zbiorów danych, które mogą być budowane i uruchamiane przez niewielki zespół. Nie zabrakło też wyczerpującego opisu praktycznej implementacji systemu Big Data z wykorzystaniem rzeczywistego przykładu. W tej książce znajdziesz: teoretyczne podstawy koncepcji systemów Big Data wskazówki umożliwiające optymalne wykorzystanie zasobów do obsługi danych wybór technik przetwarzania i obsługi wielkich ilości danych w czasie rzeczywistym zagadnienia dotyczące baz danych NoSQL, przetwarzania strumieniowego i zarządzania złożonością obliczeń przyrostowych informacje o praktycznym stosowaniu takich narzędzi jak Hadoop, Cassandra i Storm wskazówki umożliwiające poszerzenie wiedzy o zwykłych bazach danych Big Data — to skalowalność i prostota obsługi wielkich ilości danych!

32
E-book

Big Data Using Hadoop and Hive. Master Big Data Solutions with Hadoop and Hive

Mercury Learning and Information, Nitin Kumar

This book is a guide for developers and engineers to use Hadoop and Hive for scalable big data applications. It covers reading, writing, and managing large datasets with Hive and provides a concise introduction to Apache Hadoop and Hive, detailing their collaboration to simplify development. Through clear examples, the book explains the logic, code, and configurations needed for building successful distributed applications.The course starts with an introduction to big data and Apache Hadoop fundamentals. It then covers the Hadoop Distributed Filesystem and how to get started with Hadoop. The journey continues with interfaces to access HDFS files, resource management with Yet Another Resource Negotiator, and MapReduce for data processing. The book also explores Hive architecture, storage types, and the Hive query language.Mastering these concepts is vital for creating scalable big data solutions. This book ensures a smooth transition from novice to proficient Hadoop and Hive user, providing practical skills and comprehensive knowledge. By the end, readers will be able to set up, configure, and optimize Hadoop, utilize Hive for data management, and effectively solve big data challenges.