Analiza danych
Analiza danych jest ekscytującą dyscypliną, która umożliwia zrozumienie pewnych zjawisk, uzyskanie wglądu i wiedzy na podstawie surowych danych. Pojęcie to oznacza dokładnie przetwarzanie danych za pomocą technik matematycznych i statystycznych w celu uzyskania cennych wniosków, podjęcia ważnych decyzji i opracowania przydatnych produktów. Termin ten wywodzi się od angielskiego data science, często traktowanego jako synonim takich terminów, jak analityka biznesowa, badania operacyjne, business intelligence, wywiad konkurencyjny, analiza i modelowanie danych, a także pozyskiwanie wiedzy. Dzięki takim technologiom, jak języki Python czy R, platformy Hadoop i Spark masz szansę wyciągnąć maksimum wniosków, dostrzec szanse na rozwój swojej organizacji albo przewidzieć i zapobiec zagrożeniom.
Real-Time Big Data Analytics. Design, process, and analyze large sets of complex data in real time
Sumit Gupta, Shilpi Saxena
Enterprise has been striving hard to deal with the challenges of data arriving in real time or near real time.Although there are technologies such as Storm and Spark (and many more) that solve the challenges of real-time data, using the appropriate technology/framework for the right business use case is the key to success. This book provides you with the skills required to quickly design, implement and deploy your real-time analytics using real-world examples of big data use cases.From the beginning of the book, we will cover the basics of varied real-time data processing frameworks and technologies. We will discuss and explain the differences between batch and real-time processing in detail, and will also explore the techniques and programming concepts using Apache Storm.Moving on, we’ll familiarize you with “Amazon Kinesis” for real-time data processing on cloud. We will further develop your understanding of real-time analytics through a comprehensive review of Apache Spark along with the high-level architecture and the building blocks of a Spark program. You will learn how to transform your data, get an output from transformations, and persist your results using Spark RDDs, using an interface called Spark SQL to work with Spark.At the end of this book, we will introduce Spark Streaming, the streaming library of Spark, and will walk you through the emerging Lambda Architecture (LA), which provides a hybrid platform for big data processing by combining real-time and precomputed batch data to provide a near real-time view of incoming data.
Redash v5 Quick Start Guide. Create and share interactive dashboards using Redash
Alexander Leibzon, Yael Leibzon
Data exploration and visualization is vital to Business Intelligence, the backbone of almost every enterprise or organization. Redash is a querying and visualization tool developed to simplify how marketing and business development departments are exposed to data. If you want to learn to create interactive dashboards with Redash, explore different visualizations, and share the insights with your peers, then this is the ideal book for you.The book starts with essential Business Intelligence concepts that are at the heart of data visualizations. You will learn how to find your way round Redash and its rich array of data visualization options for building interactive dashboards. You will learn how to create data storytelling and share these with peers. You will see how to connect to different data sources to process complex data, and then visualize this data to reveal valuable insights. By the end of this book, you will be confident with the Redash dashboarding tool to provide insight and communicate data storytelling.
Luca Massaron, Alberto Boschetti
Regression is the process of learning relationships between inputs and continuous outputs from example data, which enables predictions for novel inputs. There are many kinds of regression algorithms, and the aim of this book is to explain which is the right one to use for each set of problems and how to prepare real-world data for it. With this book you will learn to define a simple regression problem and evaluate its performance. The book will help you understand how to properly parse a dataset, clean it, and create an output matrix optimally built for regression. You will begin with a simple regression algorithm to solve some data science problems and then progress to more complex algorithms. The book will enable you to use regression models to predict outcomes and take critical business decisions. Through the book, you will gain knowledge to use Python for building fast better linear models and to apply the results in Python or in any computer language you prefer.
Giuseppe Ciaburro
Regression analysis is a statistical process which enables prediction of relationships between variables. The predictions are based on the casual effect of one variable upon another. Regression techniques for modeling and analyzing are employed on large set of data in order to reveal hidden relationship among the variables.This book will give you a rundown explaining what regression analysis is, explaining you the process from scratch. The first few chapters give an understanding of what the different types of learning are – supervised and unsupervised, how these learnings differ from each other. We then move to covering the supervised learning in details covering the various aspects of regression analysis. The outline of chapters are arranged in a way that gives a feel of all the steps covered in a data science process – loading the training dataset, handling missing values, EDA on the dataset, transformations and feature engineering, model building, assessing the model fitting and performance, and finally making predictions on unseen datasets. Each chapter starts with explaining the theoretical concepts and once the reader gets comfortable with the theory, we move to the practical examples to support the understanding. The practical examples are illustrated using R code including the different packages in R such as R Stats, Caret and so on. Each chapter is a mix of theory and practical examples.By the end of this book you will know all the concepts and pain-points related to regression analysis, and you will be able to implement your learning in your projects.
Svetlana Karslioglu
Pachyderm is an open source project that enables data scientists to run reproducible data pipelines and scale them to an enterprise level. This book will teach you how to implement Pachyderm to create collaborative data science workflows and reproduce your ML experiments at scale.You’ll begin your journey by exploring the importance of data reproducibility and comparing different data science platforms. Next, you’ll explore how Pachyderm fits into the picture and its significance, followed by learning how to install Pachyderm locally on your computer or a cloud platform of your choice. You’ll then discover the architectural components and Pachyderm's main pipeline principles and concepts. The book demonstrates how to use Pachyderm components to create your first data pipeline and advances to cover common operations involving data, such as uploading data to and from Pachyderm to create more complex pipelines. Based on what you've learned, you'll develop an end-to-end ML workflow, before trying out the hyperparameter tuning technique and the different supported Pachyderm language clients. Finally, you’ll learn how to use a SaaS version of Pachyderm with Pachyderm Notebooks.By the end of this book, you will learn all aspects of running your data pipelines in Pachyderm and manage them on a day-to-day basis.
Robo-Advisor with Python. A hands-on guide to building and operating your own Robo-advisor
Aki Ranin
Robo-advisors are becoming table stakes for the wealth management industry across all segments, from retail to high-net-worth investors. Robo-advisors enable you to manage your own portfolios and financial institutions to create automated platforms for effective digital wealth management. This book is your hands-on guide to understanding how Robo-advisors work, and how to build one efficiently. The chapters are designed in a way to help you get a comprehensive grasp of what Robo-advisors do and how they are structured with an end-to-end workflow.You’ll begin by learning about the key decisions that influence the building of a Robo-advisor, along with considerations on building and licensing a platform. As you advance, you’ll find out how to build all the core capabilities of a Robo-advisor using Python, including goals, risk questionnaires, portfolios, and projections. The book also shows you how to create orders, as well as open accounts and perform KYC verification for transacting. Finally, you’ll be able to implement capabilities such as performance reporting and rebalancing for operating a Robo-advisor with ease.By the end of this book, you’ll have gained a solid understanding of how Robo-advisors work and be well on your way to building one for yourself or your business.
Carol Fairchild, Dr. Thomas L. Harman
ROS is a robust robotics framework that works regardless of hardware architecture or hardware origin. It standardizes most layers of robotics functionality from device drivers to process control and message passing to software package management. But apart from just plain functionality, ROS is a great platform to learn about robotics itself and to simulate, as well as actually build, your first robots. This does not mean that ROS is a platform for students and other beginners; on the contrary, ROS is used all over the robotics industry to implement flying, walking and diving robots, yet implementation is always straightforward, and never dependent on the hardware itself.ROS Robotics has been the standard introduction to ROS for potential professionals and hobbyists alike since the original edition came out; the second edition adds a gradual introduction to all the goodness available with the Kinetic Kame release.By providing you with step-by-step examples including manipulator arms and flying robots, the authors introduce you to the new features. The book is intensely practical, with space given to theory only when absolutely necessary. By the end of this book, you will have hands-on experience on controlling robots with the best possible framework.
Enrico Murru
The Salesforce Advanced Administrator certification extends beyond administrator certification, covering advanced platform features and functions such as configuration, automation, security, and customization. Complete with comprehensive coverage of all these topics and exam-oriented questions and mock tests, this Salesforce book will help you earn advanced administrator credentials. You'll start your journey by mastering data access security, monitoring and auditing, and understanding best practices for handling change management and data across organizations. The book then delves into data model management for improving data quality and lets you explore Sales features such as products, schedules, quotes, and forecasting capabilities. As you progress, this book will guide you in working with content management to set up and maintain Salesforce content. You'll also master organizing your files and data using reports and dashboards. Finally, you'll learn how to use a combination of automation tools to solve business problems.By the end of the book, you will have developed the skills required to get your advanced administrator credentials.
Ahsan Zafar
As Salesforce orgs mature over time, data management and integrations are becoming more challenging than ever. Salesforce Data Architecture and Management follows a hands-on approach to managing data and tracking the performance of your Salesforce org.You’ll start by understanding the role and skills required to become a successful data architect. The book focuses on data modeling concepts, how to apply them in Salesforce, and how they relate to objects and fields in Salesforce. You’ll learn the intricacies of managing data in Salesforce, starting from understanding why Salesforce has chosen to optimize for read rather than write operations. After developing a solid foundation, you’ll explore examples and best practices for managing your data. You’ll understand how to manage your master data and discover what the Golden Record is and why it is important for organizations. Next, you'll learn how to align your MDM and CRM strategy with a discussion on Salesforce’s Customer 360 and its key components. You’ll also cover data governance, its multiple facets, and how GDPR compliance can be achieved with Salesforce. Finally, you'll discover Large Data Volumes (LDVs) and best practices for migrating data using APIs.By the end of this book, you’ll be well-versed with data management, data backup, storage, and archiving in Salesforce.
Vinay Singh
The SAP BusinessObjects Business Intelligence platform is a powerful reporting and analysis tool. This book is the ideal introduction to the SAP BusinessObjects Business Intelligence platform, introducing you to its data visualization, visual analytics, reporting, and dashboarding capabilities.The book starts with an overview of the BI platform and various data sources for reporting. Then, we move on to looking at data visualization, analysis, reporting, and analytics using BusinessObjects Business Intelligence tools. You will learn about the features associated with reporting, scheduling, and distribution and learn how to deploy the platform. Toward the end, you will learn about the strategies and factors that should be considered during deployment.By the end, you will be confident working with the SAP BusinessObjects Business Intelligence platform to deliver better insights for more effective decision making.
Harish Gulati
SAS is a groundbreaking tool for advanced predictive and statistical analytics used by top banks and financial corporations to establish insights from their financial data.SAS for Finance offers you the opportunity to leverage the power of SAS analytics in redefining your data. Packed with real-world examples from leading financial institutions, the author discusses statistical models using time series data to resolve business issues.This book shows you how to exploit the capabilities of this high-powered package to create clean, accurate financial models. You can easily assess the pros and cons of models to suit your unique business needs.By the end of this book, you will be able to leverage the true power of SAS to design and develop accurate analytical models to gain deeper insights into your financial data.
Md. Rezaul Karim, Sridhar Alla
Scala has been observing wide adoption over the past few years, especially in the field of data science and analytics. Spark, built on Scala, has gained a lot of recognition and is being used widely in productions. Thus, if you want to leverage the power of Scala and Spark to make sense of big data, this book is for you.The first part introduces you to Scala, helping you understand the object-oriented and functional programming concepts needed for Spark application development. It then moves on to Spark to cover the basic abstractions using RDD and DataFrame. This will help you develop scalable and fault-tolerant streaming applications by analyzing structured and unstructured data using SparkSQL, GraphX, and Spark structured streaming. Finally, the book moves on to some advanced topics, such as monitoring, configuration, debugging, testing, and deployment.You will also learn how to develop Spark applications using SparkR and PySpark APIs, interactive data analytics using Zeppelin, and in-memory data processing with Alluxio.By the end of this book, you will have a thorough understanding of Spark, and you will be able to perform full-stack data analytics with a feel that no amount of data is too big.
Patrick R. Nicolas
The discovery of information through data clustering and classification is becoming a key differentiator for competitive organizations. Machine learning applications are everywhere, from self-driving cars, engineering design, logistics, manufacturing, and trading strategies, to detection of genetic anomalies. The book is your one stop guide that introduces you to the functional capabilities of the Scala programming language that are critical to the creation of machine learning algorithms such as dependency injection and implicits. You start by learning data preprocessing and filtering techniques. Following this, you'll move on to unsupervised learning techniques such as clustering and dimension reduction, followed by probabilistic graphical models such as Naïve Bayes, hidden Markov models and Monte Carlo inference. Further, it covers the discriminative algorithms such as linear, logistic regression with regularization, kernelization, support vector machines, neural networks, and deep learning. You’ll move on to evolutionary computing, multibandit algorithms, and reinforcement learning.Finally, the book includes a comprehensive overview of parallel computing in Scala and Akka followed by a description of Apache Spark and its ML library. With updated codes based on the latest version of Scala and comprehensive examples, this book will ensure that you have more than just a solid fundamental knowledge in machine learning with Scala.
Scala: Guide for Data Science Professionals. Build robust data pipelines with Scala
Arun Manivannan, Pascal Bugnion, Patrick R. Nicolas
Scala is especially good for analyzing large sets of data as the scale of the task doesn’t have any significant impact on performance. Scala’s powerful functional libraries can interact with databases and build scalable frameworks — resulting in the creation of robust data pipelines. The first module introduces you to Scala libraries to ingest, store, manipulate, process, and visualize data. Using real world examples, you will learn how to design scalable architecture to process and model data — starting from simple concurrency constructs and progressing to actor systems and Apache Spark. After this, you will also learn how to build interactive visualizations with web frameworks.Once you have become familiar with all the tasks involved in data science, you will explore data analytics with Scala in the second module. You’ll see how Scala can be used to make sense of data through easy to follow recipes. You will learn about Bokeh bindings for exploratory data analysis and quintessential machine learning with algorithms with Spark ML library. You’ll get a sufficient understanding of Spark streaming, machine learning for streaming data, and Spark graphX. Armed with a firm understanding of data analysis, you will be ready to explore the most cutting-edge aspect of data science — machine learning. The final module teaches you the A to Z of machine learning with Scala. You’ll explore Scala for dependency injections and implicits, which are used to write machine learning algorithms. You’ll also explore machine learning topics such as clustering, dimentionality reduction, Naïve Bayes, Regression models, SVMs, neural networks, and more. This learning path combines some of the best that Packt has to offer into one complete, curated package. It includes content from the following Packt products:• Scala for Data Science, Pascal Bugnion• Scala Data Analysis Cookbook, Arun Manivannan • Scala for Machine Learning, Patrick R. Nicolas
Sinchan Banerjee
Java architectural patterns and tools help architects to build reliable, scalable, and secure data engineering solutions that collect, manipulate, and publish data.This book will help you make the most of the architecting data solutions available with clear and actionable advice from an expert.You’ll start with an overview of data architecture, exploring responsibilities of a Java data architect, and learning about various data formats, data storage, databases, and data application platforms as well as how to choose them. Next, you’ll understand how to architect a batch and real-time data processing pipeline. You’ll also get to grips with the various Java data processing patterns, before progressing to data security and governance. The later chapters will show you how to publish Data as a Service and how you can architect it. Finally, you’ll focus on how to evaluate and recommend an architecture by developing performance benchmarks, estimations, and various decision metrics.By the end of this book, you’ll be able to successfully orchestrate data architecture solutions using Java and related technologies as well as to evaluate and present the most suitable solution to your clients.
Tarik Makota, Brian Maguire, Danny Gagne, Rajeev...
Amazon Kinesis is a collection of secure, serverless, durable, and highly available purpose-built data streaming services. This data streaming service provides APIs and client SDKs that enable you to produce and consume data at scale.Scalable Data Streaming with Amazon Kinesis begins with a quick overview of the core concepts of data streams, along with the essentials of the AWS Kinesis landscape. You'll then explore the requirements of the use case shown through the book to help you get started and cover the key pain points encountered in the data stream life cycle. As you advance, you'll get to grips with the architectural components of Kinesis, understand how they are configured to build data pipelines, and delve into the applications that connect to them for consumption and processing. You'll also build a Kinesis data pipeline from scratch and learn how to implement and apply practical solutions. Moving on, you'll learn how to configure Kinesis on a cloud platform. Finally, you’ll learn how other AWS services can be integrated into Kinesis. These services include Redshift, Dynamo Database, AWS S3, Elastic Search, and third-party applications such as Splunk.By the end of this AWS book, you’ll be able to build and deploy your own Kinesis data pipelines with Kinesis Data Streams (KDS), Kinesis Data Firehose (KFH), Kinesis Video Streams (KVS), and Kinesis Data Analytics (KDA).