Ebooki
1897
Ebook

Apache Solr for Indexing Data. Enhance your Solr indexing experience with advanced techniques and the built-in functionalities available in Apache Solr

Anshul Johri, Sachin Handiekar

Apache Solr is a widely used, open source enterprise search server that delivers powerful indexing and searching features. These features help fetch relevant information from various sources and documentation. Solr also combines with other open source tools such as Apache Tika and Apache Nutch to provide more powerful features.This fast-paced guide starts by helping you set up Solr and get acquainted with its basic building blocks, to give you a better understanding of Solr indexing. You’ll quickly move on to indexing text and boosting the indexing time. Next, you’ll focus on basic indexing techniques, various index handlers designed to modify documents, and indexing a structured data source through Data Import Handler.Moving on, you will learn techniques to perform real-time indexing and atomic updates, as well as more advanced indexing techniques such as de-duplication. Later on, we’ll help you set up a cluster of Solr servers that combine fault tolerance and high availability. You will also gain insights into working scenarios of different aspects of Solr and how to use Solr with e-commerce data.By the end of the book, you will be competent and confident working with indexing and will have a good knowledge base to efficiently program elements.

1898
Ebook
1899
Ebook

Apache Solr PHP Integration. Build a fully-featured and scalable search application using PHP to unlock the search functions provided by Solr with this book and

Jayant Kumar

The Search tool is a very powerful for any website. No matter what type of website, the search tool helps visitors find what they are looking for using key words and narrow down the results using facets. Solr is the popular, blazing fast, open source enterprise search platform from the Apache Lucene project. It is highly scalable, providing distributed search and index replication, and it powers the search and navigation features of many of the world's largest websites.This book is a practical, hands-on, end-to-end guide that provides you with all the tools required to build a fully-featured search application using Apache Solr and PHP. The book contains practical examples and step-by-step instructions.Starting off with the basics of installing Apache Solr and integrating it with Php, the book then proceeds to explore the features provided by Solr to improve searches using Php. You will learn how to build and maintain a Solr index using Php, discover the query modes available with Solr, and how to use them to tune the Solr queries to retrieve relevant results. You will look at how to build and use facets in your search, how to tune and use fast result highlighting, and how to build a spell check and auto complete feature using Solr. You will finish by learning some of the advanced concepts required to runa large-scale enterprise level search infrastructure.

1900
Ebook
1901
Ebook

Apache Spark 2: Data Processing and Real-Time Analytics. Master complex big data processing, stream analytics, and machine learning with Apache Spark

Romeo Kienzler, Md. Rezaul Karim, Sridhar Alla, Siamak Amirghodsi, ...

Apache Spark is an in-memory, cluster-based data processing system that provides a wide range of functionalities such as big data processing, analytics, machine learning, and more. With this Learning Path, you can take your knowledge of Apache Spark to the next level by learning how to expand Spark's functionality and building your own data flow and machine learning programs on this platform.You will work with the different modules in Apache Spark, such as interactive querying with Spark SQL, using DataFrames and datasets, implementing streaming analytics with Spark Streaming, and applying machine learning and deep learning techniques on Spark using MLlib and various external tools.By the end of this elaborately designed Learning Path, you will have all the knowledge you need to master Apache Spark, and build your own big data processing and analytics pipeline quickly and without any hassle.This Learning Path includes content from the following Packt products:• Mastering Apache Spark 2.x by Romeo Kienzler• Scala and Spark for Big Data Analytics by Md. Rezaul Karim, Sridhar Alla• Apache Spark 2.x Machine Learning Cookbook by Siamak Amirghodsi, Meenakshi Rajendran, Broderick Hall, Shuen MeiCookbook

1902
Ebook

Apache Spark 2.x Cookbook. Over 70 cloud-ready recipes for distributed Big Data processing and analytics

Rishi Yadav

While Apache Spark 1.x gained a lot of traction and adoption in the early years, Spark 2.x delivers notable improvements in the areas of API, schema awareness, Performance, Structured Streaming, and simplifying building blocks to build better, faster, smarter, and more accessible big data applications. This book uncovers all these features in the form of structured recipes to analyze and mature large and complex sets of data.Starting with installing and configuring Apache Spark with various cluster managers, you will learn to set up development environments. Further on, you will be introduced to working with RDDs, DataFrames and Datasets to operate on schema aware data, and real-time streaming with various sources such as Twitter Stream and Apache Kafka. You will also work through recipes on machine learning, including supervised learning, unsupervised learning & recommendation engines in Spark.Last but not least, the final few chapters delve deeper into the concepts of graph processing using GraphX, securing your implementations, cluster optimization, and troubleshooting.

1903
Ebook

Apache Spark 2.x for Java Developers. Explore big data at scale using Apache Spark 2.x Java APIs

Sourav Gulati, Sumit Kumar

Apache Spark is the buzzword in the big data industry right now, especially with the increasing need for real-time streaming and data processing. While Spark is built on Scala, the Spark Java API exposes all the Spark features available in the Scala version for Java developers. This book will show you how you can implement various functionalities of the Apache Spark framework in Java, without stepping out of your comfort zone.The book starts with an introduction to the Apache Spark 2.x ecosystem, followed by explaining how to install and configure Spark, and refreshes the Java concepts that will be useful to you when consuming Apache Spark's APIs. You will explore RDD and its associated common Action and Transformation Java APIs, set up a production-like clustered environment, and work with Spark SQL. Moving on, you will perform near-real-time processing with Spark streaming, Machine Learning analytics with Spark MLlib, and graph processing with GraphX, all using various Java packages.By the end of the book, you will have a solid foundation in implementing components in the Spark framework in Java to build fast, real-time applications.

1904
Ebook

Apache Spark 2.x Machine Learning Cookbook. Over 100 recipes to simplify machine learning model implementations with Spark

Siamak Amirghodsi, Shuen Mei, Meenakshi Rajendran, Broderick Hall

Machine learning aims to extract knowledge from data, relying on fundamental concepts in computer science, statistics, probability, and optimization. Learning about algorithms enables a wide range of applications, from everyday tasks such as product recommendations and spam filtering to cutting edge applications such as self-driving cars and personalized medicine. You will gain hands-on experience of applying these principles using Apache Spark, a resilient cluster computing system well suited for large-scale machine learning tasks.This book begins with a quick overview of setting up the necessary IDEs to facilitate the execution of code examples that will be covered in various chapters. It also highlights some key issues developers face while working with machine learning algorithms on the Spark platform. We progress by uncovering the various Spark APIs and the implementation of ML algorithms with developing classification systems, recommendation engines, text analytics, clustering, and learning systems. Toward the final chapters, we’ll focus on building high-end applications and explain various unsupervised methodologies and challenges to tackle when implementing with big data ML systems.