Big data

497
Loading...
EBOOK

Learning Alteryx. A beginner's guide to using Alteryx for self-service analytics and business intelligence

Renato Baruti

Alteryx, as a leading data blending and advanced data analytics platform, has taken self-service data analytics to the next level. Companies worldwide often find themselves struggling to prepare and blend massive datasets that are time-consuming for analysts. Alteryx solves these problems with a repeatable workflow designed to quickly clean, prepare, blend, and join your data in a seamless manner. This book will set you on a self-service data analytics journey that will help you create efficient workflows using Alteryx, without any coding involved. It will empower you and your organization to take well-informed decisions with the help of deeper business insights from the data.Starting with the fundamentals of using Alteryx such as data preparation and blending, you will delve into the more advanced concepts such as performing predictive analytics. You will also learn how to use Alteryx’s features to share the insights gained with the relevant decision makers. To ensure consistency, we will be using data from the Healthcare domain throughout this book. The knowledge you gain from this book will guide you to solve real-life problems related to Business Intelligence confidently. Whether you are a novice with Alteryx or an experienced data analyst keen to explore Alteryx’s self-service analytics features, this book will be the perfect companion for you.

498
Loading...
EBOOK

Learning Apache Apex. Real-time streaming applications with Apex

Ananth Gundabattula, Thomas Weise, Munagala V. Ramanath,...

Apache Apex is a next-generation stream processing framework designed to operate on data at large scale, with minimum latency, maximum reliability, and strict correctness guarantees.Half of the book consists of Apex applications, showing you key aspects of data processing pipelines such as connectors for sources and sinks, and common data transformations. The other half of the book is evenly split into explaining the Apex framework, and tuning, testing, and scaling Apex applications.Much of our economic world depends on growing streams of data, such as social media feeds, financial records, data from mobile devices, sensors and machines (the Internet of Things - IoT). The projects in the book show how to process such streams to gain valuable, timely, and actionable insights. Traditional use cases, such as ETL, that currently consume a significant chunk of data engineering resources are also covered.The final chapter shows you future possibilities emerging in the streaming space, and how Apache Apex can contribute to it.

499
Loading...
EBOOK

Learning Apache Cassandra. Managing fault-tolerant, scalable data with high performance - Second Edition

Sandeep Yarabarla, Graham Doman

Cassandra is a distributed database that stands out thanks to its robust feature set and intuitive interface, while providing high availability and scalability of a distributed data store. This book will introduce you to the rich feature set offered by Cassandra, and empower you to create and manage a highly scalable, performant and fault-tolerant database layer.The book starts by explaining the new features implemented in Cassandra 3.x and get you set up with Cassandra. Then you’ll walk through data modeling in Cassandra and the rich feature set available to design a flexible schema. Next you’ll learn to create tables with composite partition keys, collections and user-defined types and get to know different methods to avoid denormalization of data. You will then proceed to create user-defined functions and aggregates in Cassandra. Then, you will set up a multi node cluster and see how the dynamics of Cassandra change with it. Finally, you will implement some application-level optimizations using a Java client.By the end of this book, you'll be fully equipped to build powerful, scalable Cassandra database layers for your applications.

500
Loading...
EBOOK

Learning Apache Mahout. Acquire practical skills in Big Data Analytics and explore data science with Apache Mahout

Chandramani Tiwary

If you are a Java developer and want to use Mahout and machine learning to solve Big Data Analytics use cases then this book is for you. Familiarity with shell scripts is assumed but no prior experience is required.

501
Loading...
EBOOK

Learning Apache Spark 2. A beginner's guide to real-time Big Data processing using the Apache Spark framework

Muhammad Asif Abbasi

Apache Spark has seen an unprecedented growth in terms of its adoption over the last few years, mainly because of its speed, diversity and real-time data processing capabilities. It has quickly become the preferred choice of tool for many Big Data professionals looking to find quick insights from large chunks of data. This book introduces you to the Apache Spark framework, and familiarizes you with all the latest features and capabilities introduced in Spark 2.Starting with a detailed introduction to Spark’s architecture and the installation procedure, this book covers everything you need to know about the Spark framework in the most practical manner. You will learn how to perform the basic ETL activities using Spark, and work with different components of Spark such as Spark SQL, as well as the Dataset and DataFrame APIs for manipulating your data. Then, you will perform machine learning using Spark MLlib, as well as perform streaming analytics and graph processing using the Spark Streaming and GraphX modules respectively. The book also gives special emphasis on deploying your Spark models, and how they can be operated in a clustered mode.During the course of the book, you will come across implementations of different real-world use-cases and examples, giving you the hands-on knowledge you need to use Apache Spark in the best possible manner.

502
Loading...
EBOOK

Learning ArcGIS Runtime SDK for .NET. Build a GIS app Using ArcGIS Runtime SDK

Ron Vincent

ArcGIS is a geographic information system (GIS) that enables you to work with maps and geographic information. It can be used to create and utilize maps, compile geographic data, analyze mapped information, share and discover geographic information and manage geographic information in a database.This book starts by showing you where ArcGIS Runtime fits within Esri’s overall platform strategy. You'll create an initial map using the SDK, then use it to get an understanding of the MVVM model. You'll find out about the different kinds of layers and start adding layers, and you'll learn to transform maps into a 3D scene. The next chapters will help you comprehend and extract information contained in the maps using co-ordinates and layer objects. Towards the end, you will learn to set the symbology, decide whether to use 2D or 3D, see how to implement 2D or 3D, and learn to search and find objects. You'll also get to grips with many other standard features of the Application Programming Interface (API), including create applications and finally testing, licensing, and deploying them. Once completed, you will be able to meet most of the common requirements of any mapping application for desktop or mobile platforms.

503
Loading...
EBOOK

Learning AWK Programming. A fast, and simple cutting-edge utility for text-processing on the Unix-like environment

Shiwang Kalkhanda

AWK is one of the most primitive and powerful utilities which exists in all Unix and Unix-like distributions. It is used as a command-line utility when performing a basic text-processing operation, and as programming language when dealing with complex text-processing and mining tasks. With this book, you will have the required expertise to practice advanced AWK programming in real-life examples. The book starts off with an introduction to AWK essentials. You will then be introduced to regular expressions, AWK variables and constants, arrays and AWK functions and more. The book then delves deeper into more complex tasks, such as printing formatted output in AWK, control flow statements, GNU's implementation of AWK covering the advanced features of GNU AWK, such as network communication, debugging, and inter-process communication in the GAWK programming language which is not easily possible with AWK. By the end of this book, the reader will have worked on the practical implementation of text processing and pattern matching using AWK to perform routine tasks.

504
Loading...
EBOOK

Learning Bayesian Models with R. Become an expert in Bayesian Machine Learning methods using R and apply them to solve real-world big data problems

Hari Manassery Koduvely

Bayesian Inference provides a unified framework to deal with all sorts of uncertainties when learning patterns form data using machine learning models and use it for predicting future observations. However, learning and implementing Bayesian models is not easy for data science practitioners due to the level of mathematical treatment involved. Also, applying Bayesian methods to real-world problems requires high computational resources. With the recent advances in computation and several open sources packages available in R, Bayesian modeling has become more feasible to use for practical applications today. Therefore, it would be advantageous for all data scientists and engineers to understand Bayesian methods and apply them in their projects to achieve better results.Learning Bayesian Models with R starts by giving you a comprehensive coverage of the Bayesian Machine Learning models and the R packages that implement them. It begins with an introduction to the fundamentals of probability theory and R programming for those who are new to the subject. Then the book covers some of the important machine learning methods, both supervised and unsupervised learning, implemented using Bayesian Inference and R.Every chapter begins with a theoretical description of the method explained in a very simple manner. Then, relevant R packages are discussed and some illustrations using data sets from the UCI Machine Learning repository are given. Each chapter ends with some simple exercises for you to get hands-on experience of the concepts and R packages discussed in the chapter.The last chapters are devoted to the latest development in the field, specifically Deep Learning, which uses a class of Neural Network models that are currently at the frontier of Artificial Intelligence. The book concludes with the application of Bayesian methods on Big Data using the Hadoop and Spark frameworks.