Big data
Chanchal Singh, Manisha Sethi, Manish Kumar, Anshul...
Apache Kafka is a popular distributed streaming platform that acts as a messaging queue or an enterprise messaging system. It lets you publish and subscribe to a stream of records, and process them in a fault-tolerant way as they occur.This book is a comprehensive guide to designing and architecting enterprise-grade streaming applications using Apache Kafka and other big data tools. It includes best practices for building such applications, and tackles some common challenges such as how to use Kafka efficiently and handle high data volumes with ease. This book first takes you through understanding the type messaging system and then provides a thorough introduction to Apache Kafka and its internal details. The second part of the book takes you through designing streaming application using various frameworks and tools such as Apache Spark, Apache Storm, and more. Once you grasp the basics, we will take you through more advanced concepts in Apache Kafka such as capacity planning and security.By the end of this book, you will have all the information you need to be comfortable with using Apache Kafka, and to design efficient streaming data applications with it.
Rising Odegua, Stephen Oni
Most data analysts use Python and pandas for data processing for the convenience and performance these libraries provide. However, JavaScript developers have always wanted to use machine learning in the browser as well. This book focuses on how Danfo.js brings data processing, analysis, and ML tools to JavaScript developers and how to make the most of this library to build data-driven applications.Starting with an overview of modern JavaScript, you’ll cover data analysis and transformation with Danfo.js and Dnotebook. The book then shows you how to load different datasets, combine and analyze them by performing operations such as handling missing values and string manipulations. You’ll also get to grips with data plotting, visualization, aggregation, and group operations by combining Danfo.js with Plotly. As you advance, you’ll create a no-code data analysis and handling system and create-react-app, react-table, react-chart, Draggable.js, and tailwindcss, and understand how to use TensorFlow.js and Danfo.js to build a recommendation system. Finally, you’ll build a Twitter analytics dashboard powered by Danfo.js, Next.js, node-nlp, and Twit.js.By the end of this app development book, you’ll be able to build and embed data analytics, visualization, and ML capabilities into any JavaScript app in server-side Node.js or the browser.
Brij Kishore Pandey, Emily Ro Schoof
Modern extract, transform, and load (ETL) pipelines for data engineering have favored the Python language for its broad range of uses and a large assortment of tools, applications, and open source components. With its simplicity and extensive library support, Python has emerged as the undisputed choice for data processing.In this book, you’ll walk through the end-to-end process of ETL data pipeline development, starting with an introduction to the fundamentals of data pipelines and establishing a Python development environment to create pipelines. Once you've explored the ETL pipeline design principles and ET development process, you'll be equipped to design custom ETL pipelines. Next, you'll get to grips with the steps in the ETL process, which involves extracting valuable data; performing transformations, through cleaning, manipulation, and ensuring data integrity; and ultimately loading the processed data into storage systems. You’ll also review several ETL modules in Python, comparing their pros and cons when building data pipelines and leveraging cloud tools, such as AWS, to create scalable data pipelines. Lastly, you’ll learn about the concept of test-driven development for ETL pipelines to ensure safe deployments.By the end of this book, you’ll have worked on several hands-on examples to create high-performance ETL pipelines to develop robust, scalable, and resilient environments using Python.
Michael Olafusi, Olanrewaju Oyinbooke
M365 Excel is a modern Excel version that is constantly updated with features that make creating and automating analyses, reports, and dashboards very easy compared with older Excel versions. This book will help you leverage its full capabilities, beginning with a quick overview of what dashboards are and how they are different from other types of reports. Then, you’ll familiarize yourself with the different standard dashboards currently available and what they are meant to accomplish for organizations. As you progress, you’ll get to grips with the use of new powerful tools such as Power Query and dynamic array formulae in the automation of analysis, gaining insights into the right approach to take in building effective dashboards. You’ll equip yourself with not only all the essential formulae, charts, and non-chart visuals but also learn how to set up your dashboard perfectly. Along the way, you’ll build a couple of awesome dashboards from scratch to utilize your newfound knowledge.By the end of this book, you will be able to carry out an impressive and robust level of analysis on business data that may come from multiple sources or files, using better processes, formulae, and best practices in M365 to create insightful dashboards faster.
Willi Richert , Luis Pedro Coelho
Machine learning, the field of building systems that learn from data, is exploding on the Web and elsewhere. Python is a wonderful language in which to develop machine learning applications. As a dynamic language, it allows for fast exploration and experimentation and an increasing number of machine learning libraries are developed for Python.Building Machine Learning system with Python shows you exactly how to find patterns through raw data. The book starts by brushing up on your Python ML knowledge and introducing libraries, and then moves on to more serious projects on datasets, Modelling, Recommendations, improving recommendations through examples and sailing through sound and image processing in detail. Using open-source tools and libraries, readers will learn how to apply methods to text, images, and sounds. You will also learn how to evaluate, compare, and choose machine learning techniques. Written for Python programmers, Building Machine Learning Systems with Python teaches you how to use open-source libraries to solve real problems with machine learning. The book is based on real-world examples that the user can build on.Readers will learn how to write programs that classify the quality of StackOverflow answers or whether a music file is Jazz or Metal. They will learn regression, which is demonstrated on how to recommend movies to users. Advanced topics such as topic modeling (finding a text's most important topics), basket analysis, and cloud computing are covered as well as many other interesting aspects.Building Machine Learning Systems with Python will give you the tools and understanding required to build your own systems, which are tailored to solve your problems.
Luis Pedro Coelho, Willi Richert , Matthieu...
Machine learning enables systems to make predictions based on historical data. Python is one of the most popular languages used to develop machine learning applications, thanks to its extensive library support. This updated third edition of Building Machine Learning Systems with Python helps you get up to speed with the latest trends in artificial intelligence (AI).With this guide’s hands-on approach, you’ll learn to build state-of-the-art machine learning models from scratch. Complete with ready-to-implement code and real-world examples, the book starts by introducing the Python ecosystem for machine learning. You’ll then learn best practices for preparing data for analysis and later gain insights into implementing supervised and unsupervised machine learning techniques such as classification, regression and clustering. As you progress, you’ll understand how to use Python’s scikit-learn and TensorFlow libraries to build production-ready and end-to-end machine learning system models, and then fine-tune them for high performance.By the end of this book, you’ll have the skills you need to confidently train and deploy enterprise-grade machine learning models in Python.
Laura Funderburk
Modern LLM applications often break in production due to brittle pipelines, loose tool definitions, and noisy context. This book shows you how to build production-ready, context-aware systems using Haystack and LangGraph. You’ll learn to design deterministic pipelines with strict tool contracts and deploy them as microservices. Through structured context engineering, you’ll orchestrate reliable agent workflows and move beyond simple prompt-based interactions. You'll start by understanding LLM behavior—tokens, embeddings, and transformer models—and see how prompt engineering has evolved into a full context engineering discipline. Then, you'll build retrieval-augmented generation (RAG) pipelines with retrievers, rankers, and custom components using Haystack’s graph-based architecture. You’ll also create knowledge graphs, synthesize unstructured data, and evaluate system behavior using Ragas and Weights & Biases. In LangGraph, you’ll orchestrate agents with supervisor-worker patterns, typed state machines, retries, fallbacks, and safety guardrails. By the end of the book, you’ll have the skills to design scalable, testable LLM pipelines and multi-agent systems that remain robust as the AI ecosystem evolves.*Email sign-up and proof of purchase required
Syed Omar Faruk Towaha
With the use of drones, DIY projects have taken off. Programmers are rapidly moving from traditional application programming to developing exciting multi-utility projects.This book will teach you to build industry-level drones with Arduino and ESP8266 and their modified versions of hardware.With this book, you will explore techniques for leveraging the tiny WiFi chip to enhance your drone and control it over a mobile phone. This book will start with teaching you how to solve problems while building your own WiFi controlled Arduino based drone. You will also learn how to build a Quadcopter and a mission critical drone. Moving on you will learn how to build a prototype drone that will be given a mission to complete which it will do it itself. You will also learn to build various exciting projects such as gliding and racing drones. By the end of this book you will learn how to maintain and troubleshoot your drone.By the end of this book, you will have learned to build drones using ESP8266 and Arduino and leverage their functionalities to the fullest.
Mark Lewin, Eric Pimpler
The ArcGIS API for JavaScript enables you to quickly build web and mobile mapping applications that include sophisticated GIS capabilities, yet are easy and intuitive for the user.Aimed at both new and experienced web developers, this practical guide gives you everything you need to get started with the API. After a brief introduction to HTML/CSS/JavaScript, you'll embed maps in a web page, add the tiled, dynamic, and streaming data layers that your users will interact with, and mark up the map with graphics. You will learn how to quickly incorporate a broad range of useful user interface elements and GIS functionality to your application with minimal effort using prebuilt widgets. As the book progresses, you will discover and use the task framework to query layers with spatial and attribute criteria, search for and identify features on the map, geocode addresses, perform network analysis and routing, and add custom geoprocessing operations. Along the way, we cover exciting new features such as the client-side geometry engine, learn how to integrate content from ArcGIS.com, and use your new skills to build mobile web mapping applications.We conclude with a look at version 4 of the ArcGIS API for JavaScript (which is being developed in parallel with version 3.x) and what it means for you as a developer.
Davide Moraschi
Business intelligence is becoming more important by the day, with cloud offerings and mobile devices gaining wider acceptance and achieving better market penetration. MicroStrategy Reporting Suite is a complete business intelligence platform that covers all the data analysis needs of an enterprise. Scorecards, dashboards, and reports can be explored and delivered on desktop, the Web, mobile devices, and the Cloud. With the latest Visual Insight tool, MicroStrategy brings the power of BI to the business users, allowing them to discover information without the help of IT personnel.Business Intelligence with MicroStrategy Cookbook covers the full cycle of a BI project with the MicroStrategy platform, from setting up the software to using dashboards in the cloud and on mobile devices. This book uses step-by-step instructions to teach you everything from the very basics to the more advanced topics. We will start by downloading and installing the software and a well-known sample SQL Server database. Then, one brick at a time, we will construct a fully-featured BI solution with a web interface, mobile reporting, and agile analytics.The chapters are ordered by increasing difficulty, and each one builds on top of the preceding chapter so that the learning process is progressive. The examples given in this book are practical, and you will be able to see the immediate result of your efforts. We will first cover setting up the platform, including the creation of the metadata and the different objects that are part of a BI project: tables, attributes, and metrics. Then, we take a look at how to create and analyze reports, charts, documents, and dashboards, as well as how to manipulate data with the desktop application, the web Interface, and an iPad device.The last part of the book is dedicated to advanced topics like the new agile analytics technology from MicroStrategy, where we cover both Visual Insight and MicroStrategy Cloud Express. Whether you are a database developer, data analyst, or a business user, Business Intelligence with MicroStrategy Cookbook will get you up to speed with one of the most powerful BI platforms on the market with the smallest possible investment of time and money.
Cacti Beginner's Guide. Leverage Cacti to design a robust network operations center - Second Edition
Thomas Urban
Cacti is a performance measurement tool that provides easy methods and functions for gathering and graphing system data. You can use Cacti to develop a robust event management system that can alert on just about anything you would like it to. But to do that, you need to gain a solid understanding of the basics of Cacti, its plugin architecture, and automation concepts.Cacti Beginner's Guide will introduce you to the wide variety of features of Cacti and will guide you on how to use them for maximum effectiveness. Advanced topics such as the plugin architecture and Cacti automation using the command-line interface will help you build a professional performance measurement system. Designed as a beginner's guide, the book starts off with the basics of installing and using Cacti, and also covers the advanced topics that will show you how to customize and extend the core Cacti functionalities. The book offers essential tutorials for creating advanced graphs and using plugins to create enterprise-class reports to show your customers and colleagues. From data templates to input methods and plugin installation to creating your own customized plugins, this book provides you with a rich selection of step-by-step instructions to reach your goals. It covers all you need to know to implement professional performance measurement techniques with Cacti and ways to fully customize Cacti to fit your needs. You will also learn how to migrate Cacti to new servers. Lastly you will also be introduced to the latest feature of building a scalable remote poller environment. By the end of the book, you will be able to implement and extend Cacti to monitor, display, and report the performance of your network exactly the way you want.
David Mertz
Data cleaning is the all-important first step to successful data science, data analysis, and machine learning. If you work with any kind of data, this book is your go-to resource, arming you with the insights and heuristics experienced data scientists had to learn the hard way.In a light-hearted and engaging exploration of different tools, techniques, and datasets real and fictitious, Python veteran David Mertz teaches you the ins and outs of data preparation and the essential questions you should be asking of every piece of data you work with.Using a mixture of Python, R, and common command-line tools, Cleaning Data for Effective Data Science follows the data cleaning pipeline from start to end, focusing on helping you understand the principles underlying each step of the process. You'll look at data ingestion of a vast range of tabular, hierarchical, and other data formats, impute missing values, detect unreliable data and statistical anomalies, and generate synthetic features. The long-form exercises at the end of each chapter let you get hands-on with the skills you've acquired along the way, also providing a valuable resource for academic courses.
MrExcel's Holy Macro! Books, Oz du Soleil
This book provides a step-by-step guide to using Power Query in Excel for efficient data cleaning and transformation. Starting with an introduction to its capabilities, it explains how to import data, handle missing values, and parse text fields with ease.Advanced techniques such as merging datasets, appending data, and performing joins are explored in detail. The book also covers grouping data, creating conditional and custom columns, and reshaping data through unpivoting for analysis. Each concept is illustrated with practical examples for clarity.By the end of the book, readers will be equipped with the skills to automate repetitive tasks and streamline workflows. Whether dealing with messy data or preparing datasets for analysis, this guide ensures you can confidently tackle any Excel data transformation challenge.
Eric Richard Rochester
This book is for those with a basic knowledge of Clojure, who are looking to push the language to excel with data analysis.
Clojure for Data Science. Statistics, big data, and machine learning for Clojure programmers
Henry Garner
The term “data science” has been widely used to define this new profession that is expected to interpret vast datasets and translate them to improved decision-making and performance. Clojure is a powerful language that combines the interactivity of a scripting language with the speed of a compiled language. Together with its rich ecosystem of native libraries and an extremely simple and consistent functional approach to data manipulation, which maps closely to mathematical formula, it is an ideal, practical, and flexible language to meet a data scientist’s diverse needs.Taking you on a journey from simple summary statistics to sophisticated machine learning algorithms, this book shows how the Clojure programming language can be used to derive insights from data. Data scientists often forge a novel path, and you’ll see how to make use of Clojure’s Java interoperability capabilities to access libraries such as Mahout and Mllib for which Clojure wrappers don’t yet exist. Even seasoned Clojure developers will develop a deeper appreciation for their language’s flexibility!You’ll learn how to apply statistical thinking to your own data and use Clojure to explore, analyze, and visualize it in a technically and statistically robust way. You can also use Incanter for local data processing and ClojureScript to present interactive visualisations and understand how distributed platforms such as Hadoop sand Spark’s MapReduce and GraphX’s BSP solve the challenges of data analysis at scale, and how to explain algorithms using those programming models.Above all, by following the explanations in this book, you’ll learn not just how to be effective using the current state-of-the-art methods in data science, but why such methods work so that you can continue to be productive as the field evolves into the future.
Sanket Thodge
With the ongoing data explosion, more and more organizations all over the world are slowly migrating their infrastructure to the cloud. These cloud platforms also provide their distinct analytics services to help you get faster insights from your data. This book will give you an introduction to the concept of analytics on the cloud, and the different cloud services popularly used for processing and analyzing data. If you’re planning to adopt the cloud analytics model for your business, this book will help you understand the design and business considerations to be kept in mind, and choose the best tools and alternatives for analytics, based on your requirements. The chapters in this book will take you through the 70+ services available in Google Cloud Platform and their implementation for practical purposes. From ingestion to processing your data, this book contains best practices on building an end-to-end analytics pipeline on the cloud by leveraging popular concepts such as machine learning and deep learning.By the end of this book, you will have a better understanding of cloud analytics as a concept as well as a practical know-how of its implementation