Ebooks
1849
Ebook

Apache Kafka Quick Start Guide. Leverage Apache Kafka 2.0 to simplify real-time data processing for distributed applications

Raúl Estrada

Apache Kafka is a great open source platform for handling your real-time data pipeline to ensure high-speed filtering and pattern matching on the ?y. In this book, you will learn how to use Apache Kafka for efficient processing of distributed applications and will get familiar with solving everyday problems in fast data and processing pipelines.This book focuses on programming rather than the configuration management of Kafka clusters or DevOps. It starts off with the installation and setting up the development environment, before quickly moving on to performing fundamental messaging operations such as validation and enrichment.Here you will learn about message composition with pure Kafka API and Kafka Streams. You will look into the transformation of messages in different formats, such asext, binary, XML, JSON, and AVRO. Next, you will learn how to expose the schemas contained in Kafka with the Schema Registry. You will then learn how to work with all relevant connectors with Kafka Connect. While working with Kafka Streams, you will perform various interesting operations on streams, such as windowing, joins, and aggregations. Finally, through KSQL, you will learn how to retrieve, insert, modify, and delete data streams, and how to manipulate watermarks and windows.

1850
Ebook
1851
Ebook
1852
Ebook
1853
Ebook
1854
Ebook

Apache Mesos Cookbook. Efficiently handle and manage tasks in a distributed environment

David Blomquist, Tomasz Janiszewski

Apache Mesos is open source cluster sharing and management software. Deploying and managing scalable applications in large-scale clustered environments can be difficult, but Apache Mesos makes it easier with efficient resource isolation and sharing across application frameworks.The goal of this book is to guide you through the practical implementation of the Mesos core along with a number of Mesos supported frameworks. You will begin by installing Mesos and then learn how to configure clusters and maintain them. You will also see how to deploy a cluster in a production environment with high availability using Zookeeper.Next, you will get to grips with using Mesos, Marathon, and Docker to build and deploy a PaaS. You will see how to schedule jobs with Chronos. We’ll demonstrate how to integrate Mesos with big data frameworks such as Spark, Hadoop, and Storm. Practical solutions backed with clear examples will also show you how to deploy elastic big data jobs. You will find out how to deploy a scalable continuous integration and delivery system on Mesos with Jenkins. Finally, you will configure and deploy a highly scalable distributed search engine with ElasticSearch.Throughout the course of this book, you will get to know tips and tricks along with best practices to follow when working with Mesos.

1855
Ebook

Apache OfBiz Cookbook. Over 60 simple but incredibly effective recipes for taking control of OFBiz

Ruth Hoffman, Brian Fitzpatrick

Apache Open For Business (OFBiz) is an enterprise resource planning (ERP) system that provides a common data model and an extensive set of business processes. But without proper guidance on developing performance-critical applications, it is easy to make the wrong design and technology decisions. The power and promise of Apache OFBiz is comprehensively revealed in a collection of self-contained, quick, practical recipes in this Cookbook.This book covers a range of topics from initial system setup to web application and HTML page creation, Java development, and data maintenance tasks. Focusing on a series of the most commonly performed OFBiz tasks, it provides clear, cogent, and easy-to-follow instructions designed to make the most of your OFBiz experience.Let this book be your guide to enhancing your OFBiz productivity by saving you valuable time. Written specifically to give clear and straightforward answers to the most commonly asked OFBiz questions, this compendium of OFBiz recipes will show you everything you need to know to get things done in OFBiz.Whether you are new to OFBiz or an old pro, you are sure to find many useful hints and handy tips here. Topics range from getting started to configuration and system setup, security and database management through the final stages of developing and testing new OFBiz applications.

1856
Ebook

Apache Oozie Essentials. Unleash the power of Apache Oozie to create and manage your big data and machine learning pipelines in one go

Jagat Jasjit Singh

As more and more organizations are discovering the use of big data analytics, interest in platforms that provide storage, computation, and analytic capabilities is booming exponentially. This calls for data management. Hadoop caters to this need. Oozie fulfils this necessity for a scheduler for a Hadoop job by acting as a cron to better analyze data. Apache Oozie Essentials starts off with the basics right from installing and configuring Oozie from source code on your Hadoop cluster to managing your complex clusters. You will learn how to create data ingestion and machine learning workflows.This book is sprinkled with the examples and exercises to help you take your big data learning to the next level. You will discover how to write workflows to run your MapReduce, Pig ,Hive, and Sqoop scripts and schedule them to run at a specific time or for a specific business requirement using a coordinator. This book has engaging real-life exercises and examples to get you in the thick of things. Lastly, you’ll get a grip of how to embed Spark jobs, which can be used to run your machine learning models on Hadoop.By the end of the book, you will have a good knowledge of Apache Oozie. You will be capable of using Oozie to handle large Hadoop workflows and even improve the availability of your Hadoop environment.

1857
Ebook

Apache. Receptury. Wydanie II

Rich Bowen, Ken Coar

Czy wiesz, jaki serwer HTTP jest najpopularniejszy w sieci? Właśnie tak, jest to Apache! W lipcu 2008 roku jego udział w rynku wynosił blisko 50% (według Netcraft). Historia tego serwera sięga roku 1995, kiedy ukazała się jego pierwsza oficjalna wersja, oznaczona numerem 0.6.2. Cechy, które zadecydowały o sukcesie tego rozwiązania, to bezpieczeństwo, skalowalność, wielowątkowość i obsługa różnorodnych języków skryptowych. Dzięki książce "Apache. Receptury" zapoznasz się z gotowymi przepisami na rozwiązanie ciekawych, specyficznych oraz intrygujących problemów. Nauczysz się instalować serwer z różnych źródeł oraz na różnych platformach. Dowiesz się, w jaki sposób zwiększyć jego bezpieczeństwo, jak uruchomić serwery wirtualne oraz poprawić wydajność Apache. Autorzy książki pokażą Ci, jak uruchomić obsługę języków skryptowych, tak aby serwowane strony stały się dynamiczne. Cała wiedza zostanie przedstawiona w sprawdzony w tej serii sposób: problem - rozwiązanie - analiza. Sposoby instalacji serwera Apache Dodawanie funkcjonalności dzięki modułom Możliwości rejestracji zdarzeń Konfiguracja serwerów wirtualnych Wykorzystanie aliasów, przekierowań oraz przepisań (mod_rewrite) Zarządzanie dostępem do serwowanych zasobów Bezpieczeństwo serwera Apache Wykorzystanie szyfrowanej transmisji - protokół SSL Zapewnienie wydajności Wykorzystanie języków skryptowych Oto książka z najlepszymi przepisami na Apache!

1858
Ebook

Apache Roller 4.0 - Beginner's Guide. A comprehensive, step-by-step guide on how to set up, customize, and market your blog using Apache Roller

Alfonso V. Romero, Brian Fitzpatrick, Alfonso Vidal Romero

Apache Roller enables you to build a fully-featured, multi-user blog server apt for all kinds of blogging sites. It is an ideal tool to create your own blogging network with unlimited users and blogs, forums, photo galleries, and more! While it is exciting to have a list of interesting features it can offer you, it might be a little difficult to get started with it by your self.This book will teach you how to get started with Apache Roller and make the most of all its features using step-by-step, detailed instructions. You will learn how to establish your internet presence with an Apache Roller blog and use the latest web tools to enhance your posts and attract visitors. You will also learn how to promote your blog on popular social bookmarking services and customize it to suit your need.This hands-on and practical book introduces you to Apache Roller. Starting off with the configuration and installation of your own blog, you'll then quickly learn how to add interesting content to your blog with the help of plenty of examples. You'll also learn how to change your blog's visual appearance with the help of Roller themes and templates and how to create a community of blogs for you and your colleagues or friends in your Apache Roller blog server. The book also looks at ways you can manage your community, and keep your site safe and secure, ensuring that it is a spam-free, enjoyable community for your users.

1859
Ebook

Apache Solr for Indexing Data. Enhance your Solr indexing experience with advanced techniques and the built-in functionalities available in Apache Solr

Anshul Johri, Sachin Handiekar

Apache Solr is a widely used, open source enterprise search server that delivers powerful indexing and searching features. These features help fetch relevant information from various sources and documentation. Solr also combines with other open source tools such as Apache Tika and Apache Nutch to provide more powerful features.This fast-paced guide starts by helping you set up Solr and get acquainted with its basic building blocks, to give you a better understanding of Solr indexing. You’ll quickly move on to indexing text and boosting the indexing time. Next, you’ll focus on basic indexing techniques, various index handlers designed to modify documents, and indexing a structured data source through Data Import Handler.Moving on, you will learn techniques to perform real-time indexing and atomic updates, as well as more advanced indexing techniques such as de-duplication. Later on, we’ll help you set up a cluster of Solr servers that combine fault tolerance and high availability. You will also gain insights into working scenarios of different aspects of Solr and how to use Solr with e-commerce data.By the end of the book, you will be competent and confident working with indexing and will have a good knowledge base to efficiently program elements.

1860
Ebook
1861
Ebook

Apache Solr PHP Integration. Build a fully-featured and scalable search application using PHP to unlock the search functions provided by Solr with this book and

Jayant Kumar

The Search tool is a very powerful for any website. No matter what type of website, the search tool helps visitors find what they are looking for using key words and narrow down the results using facets. Solr is the popular, blazing fast, open source enterprise search platform from the Apache Lucene project. It is highly scalable, providing distributed search and index replication, and it powers the search and navigation features of many of the world's largest websites.This book is a practical, hands-on, end-to-end guide that provides you with all the tools required to build a fully-featured search application using Apache Solr and PHP. The book contains practical examples and step-by-step instructions.Starting off with the basics of installing Apache Solr and integrating it with Php, the book then proceeds to explore the features provided by Solr to improve searches using Php. You will learn how to build and maintain a Solr index using Php, discover the query modes available with Solr, and how to use them to tune the Solr queries to retrieve relevant results. You will look at how to build and use facets in your search, how to tune and use fast result highlighting, and how to build a spell check and auto complete feature using Solr. You will finish by learning some of the advanced concepts required to runa large-scale enterprise level search infrastructure.

1862
Ebook
1863
Ebook

Apache Spark 2: Data Processing and Real-Time Analytics. Master complex big data processing, stream analytics, and machine learning with Apache Spark

Romeo Kienzler, Md. Rezaul Karim, Sridhar Alla, Siamak Amirghodsi, ...

Apache Spark is an in-memory, cluster-based data processing system that provides a wide range of functionalities such as big data processing, analytics, machine learning, and more. With this Learning Path, you can take your knowledge of Apache Spark to the next level by learning how to expand Spark's functionality and building your own data flow and machine learning programs on this platform.You will work with the different modules in Apache Spark, such as interactive querying with Spark SQL, using DataFrames and datasets, implementing streaming analytics with Spark Streaming, and applying machine learning and deep learning techniques on Spark using MLlib and various external tools.By the end of this elaborately designed Learning Path, you will have all the knowledge you need to master Apache Spark, and build your own big data processing and analytics pipeline quickly and without any hassle.This Learning Path includes content from the following Packt products:• Mastering Apache Spark 2.x by Romeo Kienzler• Scala and Spark for Big Data Analytics by Md. Rezaul Karim, Sridhar Alla• Apache Spark 2.x Machine Learning Cookbook by Siamak Amirghodsi, Meenakshi Rajendran, Broderick Hall, Shuen MeiCookbook

1864
Ebook

Apache Spark 2.x Cookbook. Over 70 cloud-ready recipes for distributed Big Data processing and analytics

Rishi Yadav

While Apache Spark 1.x gained a lot of traction and adoption in the early years, Spark 2.x delivers notable improvements in the areas of API, schema awareness, Performance, Structured Streaming, and simplifying building blocks to build better, faster, smarter, and more accessible big data applications. This book uncovers all these features in the form of structured recipes to analyze and mature large and complex sets of data.Starting with installing and configuring Apache Spark with various cluster managers, you will learn to set up development environments. Further on, you will be introduced to working with RDDs, DataFrames and Datasets to operate on schema aware data, and real-time streaming with various sources such as Twitter Stream and Apache Kafka. You will also work through recipes on machine learning, including supervised learning, unsupervised learning & recommendation engines in Spark.Last but not least, the final few chapters delve deeper into the concepts of graph processing using GraphX, securing your implementations, cluster optimization, and troubleshooting.

1865
Ebook

Apache Spark 2.x for Java Developers. Explore big data at scale using Apache Spark 2.x Java APIs

Sourav Gulati, Sumit Kumar

Apache Spark is the buzzword in the big data industry right now, especially with the increasing need for real-time streaming and data processing. While Spark is built on Scala, the Spark Java API exposes all the Spark features available in the Scala version for Java developers. This book will show you how you can implement various functionalities of the Apache Spark framework in Java, without stepping out of your comfort zone.The book starts with an introduction to the Apache Spark 2.x ecosystem, followed by explaining how to install and configure Spark, and refreshes the Java concepts that will be useful to you when consuming Apache Spark's APIs. You will explore RDD and its associated common Action and Transformation Java APIs, set up a production-like clustered environment, and work with Spark SQL. Moving on, you will perform near-real-time processing with Spark streaming, Machine Learning analytics with Spark MLlib, and graph processing with GraphX, all using various Java packages.By the end of the book, you will have a solid foundation in implementing components in the Spark framework in Java to build fast, real-time applications.

1866
Ebook

Apache Spark 2.x Machine Learning Cookbook. Over 100 recipes to simplify machine learning model implementations with Spark

Siamak Amirghodsi, Shuen Mei, Meenakshi Rajendran, Broderick Hall

Machine learning aims to extract knowledge from data, relying on fundamental concepts in computer science, statistics, probability, and optimization. Learning about algorithms enables a wide range of applications, from everyday tasks such as product recommendations and spam filtering to cutting edge applications such as self-driving cars and personalized medicine. You will gain hands-on experience of applying these principles using Apache Spark, a resilient cluster computing system well suited for large-scale machine learning tasks.This book begins with a quick overview of setting up the necessary IDEs to facilitate the execution of code examples that will be covered in various chapters. It also highlights some key issues developers face while working with machine learning algorithms on the Spark platform. We progress by uncovering the various Spark APIs and the implementation of ML algorithms with developing classification systems, recommendation engines, text analytics, clustering, and learning systems. Toward the final chapters, we’ll focus on building high-end applications and explain various unsupervised methodologies and challenges to tackle when implementing with big data ML systems.

1867
Ebook

Apache Spark Deep Learning Cookbook. Over 80 best practice recipes for the distributed training and deployment of neural networks using Keras and TensorFlow

Ahmed Sherif, Amrith Ravindra

Organizations these days need to integrate popular big data tools such as Apache Spark with highly efficient deep learning libraries if they’re looking to gain faster and more powerful insights from their data. With this book, you’ll discover over 80 recipes to help you train fast, enterprise-grade, deep learning models on Apache Spark.Each recipe addresses a specific problem, and offers a proven, best-practice solution to difficulties encountered while implementing various deep learning algorithms in a distributed environment. The book follows a systematic approach, featuring a balance of theory and tips with best practice solutions to assist you with training different types of neural networks such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs). You’ll also have access to code written in TensorFlow and Keras that you can run on Spark to solve a variety of deep learning problems in computer vision and natural language processing (NLP), or tweak to tackle other problems encountered in deep learning.By the end of this book, you'll have the skills you need to train and deploy state-of-the-art deep learning models on Apache Spark.

1868
Ebook

Apache Spark for Data Science Cookbook. Solve real-world analytical problems

Padma Priya Chitturi

Spark has emerged as the most promising big data analytics engine for data science professionals. The true power and value of Apache Spark lies in its ability to execute data science tasks with speed and accuracy. Spark’s selling point is that it combines ETL, batch analytics, real-time stream analysis, machine learning, graph processing, and visualizations. It lets you tackle the complexities that come with raw unstructured data sets with ease. This guide will get you comfortable and confident performing data science tasks with Spark. You will learn about implementations including distributed deep learning, numerical computing, and scalable machine learning. You will be shown effective solutions to problematic concepts in data science using Spark’s data science libraries such as MLLib, Pandas, NumPy, SciPy, and more. These simple and efficient recipes will show you how to implement algorithms and optimize your work.

1869
Ebook

Apache Spark for Machine Learning. Build and deploy high-performance big data AI solutions for large-scale clusters

Deepak Gowda

In the world of big data, efficiently processing and analyzing massive datasets for machine learning can be a daunting task. Written by Deepak Gowda, a data scientist with over a decade of experience and 30+ patents, this book provides a hands-on guide to mastering Spark’s capabilities for efficient data processing, model building, and optimization. With Deepak’s expertise across industries such as supply chain, cybersecurity, and data center infrastructure, he makes complex concepts easy to follow through detailed recipes.This book takes you through core machine learning concepts, highlighting the advantages of Spark for big data analytics. It covers practical data preprocessing techniques, including feature extraction and transformation, supervised learning methods with detailed chapters on regression and classification, and unsupervised learning through clustering and recommendation systems. You’ll also learn to identify frequent patterns in data and discover effective strategies to deploy and optimize your machine learning models. Each chapter features practical coding examples and real-world applications to equip you with the knowledge and skills needed to tackle complex machine learning tasks.By the end of this book, you’ll be ready to handle big data and create advanced machine learning models with Apache Spark.

1870
Ebook
1871
Ebook

Apache Spark Machine Learning Blueprints. Develop a range of cutting-edge machine learning projects with Apache Spark using this actionable guide

Alex Liu

There's a reason why Apache Spark has become one of the most popular tools in Machine Learning – its ability to handle huge datasets at an impressive speed means you can be much more responsive to the data at your disposal. This book shows you Spark at its very best, demonstrating how to connect it with R and unlock maximum value not only from the tool but also from your data.Packed with a range of project blueprints that demonstrate some of the most interesting challenges that Spark can help you tackle, you'll find out how to use Spark notebooks and access, clean, and join different datasets before putting your knowledge into practice with some real-world projects, in which you will see how Spark Machine Learning can help you with everything from fraud detection to analyzing customer attrition. You'll also find out how to build a recommendation engine using Spark's parallel computing powers.

1872
Ebook

Apache Spark Quick Start Guide. Quickly learn the art of writing efficient big data applications with Apache Spark

Shrey Mehrotra, Akash Grade

Apache Spark is a ?exible framework that allows processing of batch and real-time data. Its unified engine has made it quite popular for big data use cases. This book will help you to get started with Apache Spark 2.0 and write big data applications for a variety of use cases.It will also introduce you to Apache Spark – one of the most popular Big Data processing frameworks. Although this book is intended to help you get started with Apache Spark, but it also focuses on explaining the core concepts. This practical guide provides a quick start to the Spark 2.0 architecture and its components. It teaches you how to set up Spark on your local machine. As we move ahead, you will be introduced to resilient distributed datasets (RDDs) and DataFrame APIs, and their corresponding transformations and actions. Then, we move on to the life cycle of a Spark application and learn about the techniques used to debug slow-running applications. You will also go through Spark’s built-in modules for SQL, streaming, machine learning, and graph analysis.Finally, the book will lay out the best practices and optimization techniques that are key for writing efficient Spark applications. By the end of this book, you will have a sound fundamental understanding of the Apache Spark framework and you will be able to write and optimize Spark applications.