Електронні книги
1849
Eлектронна книга

Apache Airflow Best Practices. A practical guide to orchestrating data workflow with Apache Airflow

Dylan Intorf, Dylan Storey, Kendrick van Doorn

Data professionals face the monumental task of managing complex data pipelines, orchestrating workflows across diverse systems, and ensuring scalable, reliable data processing. This definitive guide to mastering Apache Airflow, written by experts in engineering, data strategy, and problem-solving across tech, financial, and life sciences industries, is your key to overcoming these challenges. It covers everything from the basics of Airflow and its core components to advanced topics such as custom plugin development, multi-tenancy, and cloud deployment.Starting with an introduction to data orchestration and the significant updates in Apache Airflow 2.0, this book takes you through the essentials of DAG authoring, managing Airflow components, and connecting to external data sources. Through real-world use cases, you’ll gain practical insights into implementing ETL pipelines and machine learning workflows in your environment. You’ll also learn how to deploy Airflow in cloud environments, tackle operational considerations for scaling, and apply best practices for CI/CD and monitoring. By the end of this book, you’ll be proficient in operating and using Apache Airflow, authoring high-quality workflows in Python for your specific use cases, and making informed decisions crucial for production-ready implementation.

1850
Eлектронна книга

Apache Camel Developer's Cookbook. For Apache Camel developers, this is the book you'll always want to have handy. It's stuffed full of great recipes that are designed for quick practical application. Expands your Apache Camel abilities immediately

Scott Cranton, Jakub Korab

Apache Camel is a de-facto standard for developing integrations in Java, and is based on well-understood Enterprise Integration Patterns. It is used within many commercial and open source integration products. Camel makes common integration tasks easy while still providing the developer with the means to customize the framework when the situation demands it. Tasks such as protocol mediation, message routing and transformation, and auditing are common usages of Camel. Apache Camel Developer's Cookbook provides hundreds of best practice tips for using Apache Camel in a format that helps you build your Camel projects. Each tip or recipe provides you with the most important steps to perform along with a summary of how it works, with references to further reading if you need more information. This book is intended to be a reliable information source that is quicker to use than an Internet search. Apache Camel Developer's Cookbook is a quick lookup guide that can also be read from cover to cover if you want to get a sense of the full power of Apache Camel. This book provides coverage of the full lifecycle of creating Apache Camel-based integration projects, including the structure of your Camel code and using the most common Enterprise Integration patterns. Patterns like Split/Join and Aggregation are covered in depth in this book. Throughout this book, you will be learning steps to transform your data. You will also learn how to perform unit and integration testing of your code using Camel's extensive testing framework, and also strategies for debugging and monitoring your code. Advanced topics like Error Handling, Parallel Processing, Transactions, and Security will also be covered in this book. This book provides you with practical tips on using Apache Camel based on years of hands-on experience from hundreds of integration projects.

1851
Eлектронна книга

Apache Cassandra Essentials. Create your own massively scalable Cassandra database with highly responsive database queries

Nitin Padalia

Apache Cassandra Essentials takes you step-by-step from from the basics of installation to advanced installation options and database design techniques. It gives you all the information you need to effectively design a well distributed and high performance database. You’ll get to know about the steps that are performed by a Cassandra node when you execute a read/write query, which is essential to properly maintain of a Cassandra cluster and to debug any issues. Next, you’ll discover how to integrate a Cassandra driver in your applications and perform read/write operations. Finally, you’ll learn about the various tools provided by Cassandra for serviceability aspects such as logging, metrics, backup, and recovery.

1852
Eлектронна книга
1853
Eлектронна книга

Apache Flume: Distributed Log Collection for Hadoop. If your role includes moving datasets into Hadoop, this book will help you do it more efficiently using Apache Flume. From installation to customization, it's a complete step-by-step guide on making the service work for you

Steve Hoffman, Steven Hoffman, Kevin A. McGrail

Apache Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. Its main goal is to deliver data from applications to Apache Hadoop's HDFS. It has a simple and flexible architecture based on streaming data flows. It is robust and fault tolerant with many failover and recovery mechanisms.Apache Flume: Distributed Log Collection for Hadoop covers problems with HDFS and streaming data/logs, and how Flume can resolve these problems. This book explains the generalized architecture of Flume, which includes moving data to/from databases, NO-SQL-ish data stores, as well as optimizing performance. This book includes real-world scenarios on Flume implementation.Apache Flume: Distributed Log Collection for Hadoop starts with an architectural overview of Flume and then discusses each component in detail. It guides you through the complete installation process and compilation of Flume.It will give you a heads-up on how to use channels and channel selectors. For each architectural component (Sources, Channels, Sinks, Channel Processors, Sink Groups, and so on) the various implementations will be covered in detail along with configuration options. You can use it to customize Flume to your specific needs. There are pointers given on writing custom implementations as well that would help you learn and implement them.By the end, you should be able to construct a series of Flume agents to transport your streaming data and logs from your systems into Hadoop in near real time.

1854
Eлектронна книга

Apache Gold. A Story of the Strange Southwest

Joseph A. Altsheler

Apache Gold, A Story of the Strange Southwest is a rip-roaring tale of adventure set on the Arizona frontiers of the American Old West written by Joseph Alexander Altsheler (April 29, 1862 June 5, 1919). He was an American newspaper reporter, editor and author of popular juvenile historical fiction. An abiding classic of western literature, our hero in this tale is Charles Wayne, a young but strong and sharp lad who seeks adventure in the southwesterly desert frontier of Arizona. Charles has terrifying encounters with wild beasts and Indians when he searches for the lost treasure of the Spaniards. Joseph A. Altsheler describes the vast open frontier evocatively, placing the reader in a time when equal measures of freedom and danger were abundant. Throughout Mr. Waynes traversals, were reminded of how difficult it was to survive let alone thrive in the Old West. The beauty of the unforgiving land forms a vibrant backdrop to the scrapes and challenges our heroes must face.

1855
Eлектронна книга

Apache Hadoop 3 Quick Start Guide. Learn about big data processing and analytics

Hrishikesh Vijay Karambelkar

Apache Hadoop is a widely used distributed data platform. It enables large datasets to be efficiently processed instead of using one large computer to store and process the data. This book will get you started with the Hadoop ecosystem, and introduce you to the main technical topics, including MapReduce, YARN, and HDFS.The book begins with an overview of big data and Apache Hadoop. Then, you will set up a pseudo Hadoop development environment and a multi-node enterprise Hadoop cluster. You will see how the parallel programming paradigm, such as MapReduce, can solve many complex data processing problems.The book also covers the important aspects of the big data software development lifecycle, including quality assurance and control, performance, administration, and monitoring. You will then learn about the Hadoop ecosystem, and tools such as Kafka, Sqoop, Flume, Pig, Hive, and HBase. Finally, you will look at advanced topics, including real time streaming using Apache Storm, and data analytics using Apache Spark. By the end of the book, you will be well versed with different configurations of the Hadoop 3 cluster.

1856
Eлектронна книга

Apache Hive Cookbook

Hanish Bansal, Shrey Mehrotra, Saurabh Chauhan

Hive was developed by Facebook and later open sourced in Apache community. Hive provides SQL like interface to run queries on Big Data frameworks. Hive provides SQL like syntax also called as HiveQL that includes all SQL capabilities like analytical functions which are the need of the hour in today’s Big Data world.This book provides you easy installation steps with different types of metastores supported by Hive. This book has simple and easy to learn recipes for configuring Hive clients and services. You would also learn different Hive optimizations including Partitions and Bucketing. The book also covers the source code explanation of latest Hive version.Hive Query Language is being used by other frameworks including spark. Towards the end you will cover integration of Hive with these frameworks.