Analiza danych
Analiza danych jest ekscytującą dyscypliną, która umożliwia zrozumienie pewnych zjawisk, uzyskanie wglądu i wiedzy na podstawie surowych danych. Pojęcie to oznacza dokładnie przetwarzanie danych za pomocą technik matematycznych i statystycznych w celu uzyskania cennych wniosków, podjęcia ważnych decyzji i opracowania przydatnych produktów. Termin ten wywodzi się od angielskiego data science, często traktowanego jako synonim takich terminów, jak analityka biznesowa, badania operacyjne, business intelligence, wywiad konkurencyjny, analiza i modelowanie danych, a także pozyskiwanie wiedzy. Dzięki takim technologiom, jak języki Python czy R, platformy Hadoop i Spark masz szansę wyciągnąć maksimum wniosków, dostrzec szanse na rozwój swojej organizacji albo przewidzieć i zapobiec zagrożeniom.
Mastering Java for Data Science. Analytics and more for production-ready applications
Alexey Grigorev
Java is the most popular programming language, according to the TIOBE index, and it is a typical choice for running production systems in many companies, both in the startup world and among large enterprises.Not surprisingly, it is also a common choice for creating data science applications: it is fast and has a great set of data processing tools, both built-in and external. What is more, choosing Java for data science allows you to easily integrate solutions with existing software, and bring data science into production with less effort.This book will teach you how to create data science applications with Java. First, we will revise the most important things when starting a data science application, and then brush up the basics of Java and machine learning before diving into more advanced topics. We start by going over the existing libraries for data processing and libraries with machine learning algorithms. After that, we cover topics such as classification and regression, dimensionality reduction and clustering, information retrieval and natural language processing, and deep learning and big data.Finally, we finish the book by talking about the ways to deploy the model and evaluate it in production settings.
Uday Kamath, Krishna Choppella
Java is one of the main languages used by practicing data scientists; much of the Hadoop ecosystem is Java-based, and it is certainly the language that most production systems in Data Science are written in. If you know Java, Mastering Machine Learning with Java is your next step on the path to becoming an advanced practitioner in Data Science.This book aims to introduce you to an array of advanced techniques in machine learning, including classification, clustering, anomaly detection, stream learning, active learning, semi-supervised learning, probabilistic graph modeling, text mining, deep learning, and big data batch and stream machine learning. Accompanying each chapter are illustrative examples and real-world case studies that show how to apply the newly learned techniques using sound methodologies and the best Java-based tools available today.On completing this book, you will have an understanding of the tools and techniques for building powerful machine learning models to solve data science problems in just about any domain.
Mastering Kibana 6.x. Visualize your Elastic Stack data with histograms, maps, charts, and graphs
Anurag Srivastava
Kibana is one of the popular tools among data enthusiasts for slicing and dicing large datasets and uncovering Business Intelligence (BI) with the help of its rich and powerful visualizations.To begin with, Mastering Kibana 6.x quickly introduces you to the features of Kibana 6.x, before teaching you how to create smart dashboards in no time. You will explore metric analytics and graph exploration, followed by understanding how to quickly customize Kibana dashboards. In addition to this, you will learn advanced analytics such as maps, hits, and list analytics. All this will help you enhance your skills in running and comparing multiple queries and filters, influencing your data visualization skills at scale.With Kibana’s Timelion feature, you can analyze time series data with histograms and stats analytics. By the end of this book, you will have created a speedy machine learning job using X-Pack capabilities.
Cory Lesmeister
Given the growing popularity of the R-zerocost statistical programming environment, there has never been a better time to start applying ML to your data. This book will teach you advanced techniques in ML ,using? the latest code in R 3.5. You will delve into various complex features of supervised learning, unsupervised learning, and reinforcement learning algorithms to design efficient and powerful ML models. This newly updated edition is packed with fresh examples covering a range of tasks from different domains. Mastering Machine Learning with R starts by showing you how to quickly manipulate data and prepare it for analysis. You will explore simple and complex models and understand how to compare them. You’ll also learn to use the latest library support, such as TensorFlow and Keras-R, for performing advanced computations. Additionally, you’ll explore complex topics, such as natural language processing (NLP), time series analysis, and clustering, which will further refine your skills in developing applications. Each chapter will help you implement advanced ML algorithms using real-world examples. You’ll even be introduced to reinforcement learning, along with its various use cases and models. In the concluding chapters, you’ll get a glimpse into how some of these blackbox models can be diagnosed and understood. By the end of this book, you’ll be equipped with the skills to deploy ML techniques in your own projects or at work.
Cory Lesmeister
This book will teach you advanced techniques in machine learning with the latest code in R 3.3.2. You will delve into statistical learning theory and supervised learning; design efficient algorithms; learn about creating Recommendation Engines; use multi-class classification and deep learning; and more.You will explore, in depth, topics such as data mining, classification, clustering, regression, predictive modeling, anomaly detection, boosted trees with XGBOOST, and more. More than just knowing the outcome, you’ll understand how these concepts work and what they do.With a slow learning curve on topics such as neural networks, you will explore deep learning, and more. By the end of this book, you will be able to perform machine learning with R in the cloud using AWS in various scenarios with different datasets.
Gavin Hackeling
Machine learning is the buzzword bringing computer science and statistics together to build smart and efficient models. Using powerful algorithms and techniques offered by machine learning you can automate any analytical model.This book examines a variety of machine learning models including popular machine learning algorithms such as k-nearest neighbors, logistic regression, naive Bayes, k-means, decision trees, and artificial neural networks. It discusses data preprocessing, hyperparameter optimization, and ensemble methods. You will build systems that classify documents, recognize images, detect ads, and more. You will learn to use scikit-learn’s API to extract features from categorical variables, text and images; evaluate model performance, and develop an intuition for how to improve your model’s performance.By the end of this book, you will master all required concepts of scikit-learn to build efficient models at work to carry out advanced tasks with the practical approach.
Mastering Machine Learning with Spark 2.x. Harness the potential of machine learning, through spark
Michal Malohlava, Alex Tellez, Max Pumperla
The purpose of machine learning is to build systems that learn from data. Being able to understand trends and patterns in complex data is critical to success; it is one of the key strategies to unlock growth in the challenging contemporary marketplace today. With the meteoric rise of machine learning, developers are now keen on finding out how can they make their Spark applications smarter. This book gives you access to transform data into actionable knowledge. The book commences by defining machine learning primitives by the MLlib and H2O libraries. You will learn how to use Binary classification to detect the Higgs Boson particle in the huge amount of data produced by CERN particle collider and classify daily health activities using ensemble Methods for Multi-Class Classification. Next, you will solve a typical regression problem involving flight delay predictions and write sophisticated Spark pipelines. You will analyze Twitter data with help of the doc2vec algorithm and K-means clustering. Finally, you will build different pattern mining models using MLlib, perform complex manipulation of DataFrames using Spark and Spark SQL, and deploy your app in a Spark streaming environment.
Mastering Matplotlib 2.x. Effective Data Visualization techniques with Python
Benjamin Walter Keller
In this book, you’ll get hands-on with customizing your data plots with the help of Matplotlib. You’ll start with customizing plots, making a handful of special-purpose plots, and building 3D plots. You’ll explore non-trivial layouts, Pylab customization, and more about tile configuration. You’ll be able to add text, put lines in plots, and also handle polygons, shapes, and annotations. Non-Cartesian and vector plots are exciting to construct, and you’ll explore them further in this book. You’ll delve into niche plots and visualize ordinal and tabular data. In this book, you’ll be exploring 3D plotting, one of the best features when it comes to 3D data visualization, along with Jupyter Notebook, widgets, and creating movies for enhanced data representation. Geospatial plotting will also be explored. Finally, you’ll learn how to create interactive plots with the help of Jupyter.Learn expert techniques for effective data visualization using Matplotlib 3 and Python with our latest offering -- Matplotlib 3.0 Cookbook
Duncan M. McGreggor, Duncan M McGreggor
Brett Powell
This book is intended for business intelligence professionals responsible for the design and development of Power BI content as well as managers, architects and administrators who oversee Power BI projects and deployments. The chapters flow from the planning of a Power BI project through the development and distribution of content to the administration of Power BI for an organization.BI developers will learn how to create sustainable and impactful Power BI datasets, reports, and dashboards. This includes connecting to data sources, shaping and enhancing source data, and developing an analytical data model. Additionally, top report and dashboard design practices are described using features such as Bookmarks and the Power KPI visual.BI managers will learn how Power BI’s tools work together such as with the On-premises data gateway and how content can be staged and securely distributed via Apps. Additionally, both the Power BI Report Server and Power BI Premium are reviewed.By the end of this book, you will be confident in creating effective charts, tables, reports or dashboards for any kind of data using the tools and techniques in Microsoft Power BI.
Mastering MongoDB 3.x. An expert's guide to building fault-tolerant MongoDB applications
Alex Giamas
MongoDB has grown to become the de facto NoSQL database with millions of users—from small startups to Fortune 500 companies. Addressing the limitations of SQL schema-based databases, MongoDB pioneered a shift of focus for DevOps and offered sharding and replication maintainable by DevOps teams. The book is based on MongoDB 3.x and covers topics ranging from database querying using the shell, built in drivers, and popular ODM mappers to more advanced topics such as sharding, high availability, and integration with big data sources.You will get an overview of MongoDB and how to play to its strengths, with relevant use cases. After that, you will learn how to query MongoDB effectively and make use of indexes as much as possible. The next part deals with the administration of MongoDB installations on-premise or in the cloud. We deal with database internals in the next section, explaining storage systems and how they can affect performance. The last section of this book deals with replication and MongoDB scaling, along with integration with heterogeneous data sources. By the end this book, you will be equipped with all the required industry skills and knowledge to become a certified MongoDB developer and administrator.
Mastering .NET Machine Learning. Use machine learning in your .NET applications
Jamie Dixon
.Net is one of the widely used platforms for developing applications. With the meteoric rise of Machine learning, developers are now keen on finding out how can they make their .Net applications smarter. Also, .NET developers are interested into moving into the world of devices and how to apply machine learning techniques to, well, machines.This book is packed with real-world examples to easily use machine learning techniques in your business applications. You will begin with introduction to F# and prepare yourselves for machine learning using .NET framework. You will be writing a simple linear regression model using an example which predicts sales of a product. Forming a base with the regression model, you will start using machine learning libraries available in .NET framework such as Math.NET, Numl.NET and Accord.NET with the help of a sample application. You will then move on to writing multiple linear regressions and logistic regressions.You will learn what is open data and the awesomeness of type providers. Next, you are going to address some of the issues that we have been glossing over so far and take a deep dive into obtaining, cleaning, and organizing our data. You will compare the utility of building a KNN and Naive Bayes model to achieve best possible results.Implementation of Kmeans and PCA using Accord.NET and Numl.NET libraries is covered with the help of an example application. We will then look at many of issues confronting creating real-world machine learning models like overfitting and how to combat them using confusion matrixes, scaling, normalization, and feature selection. You will now enter into the world of Neural Networks and move your line of business application to a hybrid scientific application. After you have covered all the above machine learning models, you will see how to deal with very large datasets using MBrace and how to deploy machine learning models to Internet of Thing (IoT) devices so that the machine can learn and adapt on the fly.
Lior Gazit, Meysam Ghaffari, Asha Saxena
Do you want to master Natural Language Processing (NLP) but don’t know where to begin? This book will give you the right head start. Written by leaders in machine learning and NLP, Mastering NLP from Foundations to LLMs provides an in-depth introduction to techniques. Starting with the mathematical foundations of machine learning (ML), you’ll gradually progress to advanced NLP applications such as large language models (LLMs) and AI applications. You’ll get to grips with linear algebra, optimization, probability, and statistics, which are essential for understanding and implementing machine learning and NLP algorithms. You’ll also explore general machine learning techniques and find out how they relate to NLP. Next, you’ll learn how to preprocess text data, explore methods for cleaning and preparing text for analysis, and understand how to do text classification. You’ll get all of this and more along with complete Python code samples.By the end of the book, the advanced topics of LLMs’ theory, design, and applications will be discussed along with the future trends in NLP, which will feature expert opinions. You’ll also get to strengthen your practical skills by working on sample real-world NLP business problems and solutions.
Umit Mert Cakmak, Mert Cuhadaroglu
NumPy is one of the most important scientific computing libraries available for Python. Mastering Numerical Computing with NumPy teaches you how to achieve expert level competency to perform complex operations, with in-depth coverage of advanced concepts.Beginning with NumPy's arrays and functions, you will familiarize yourself with linear algebra concepts to perform vector and matrix math operations. You will thoroughly understand and practice data processing, exploratory data analysis (EDA), and predictive modeling. You will then move on to working on practical examples which will teach you how to use NumPy statistics in order to explore US housing data and develop a predictive model using simple and multiple linear regression techniques. Once you have got to grips with the basics, you will explore unsupervised learning and clustering algorithms, followed by understanding how to write better NumPy code while keeping advanced considerations in mind. The book also demonstrates the use of different high-performance numerical computing libraries and their relationship with NumPy. You will study how to benchmark the performance of different configurations and choose the best for your system.By the end of this book, you will have become an expert in handling and performing complex data manipulations.
Ashish Kumar
pandas is a popular Python library used by data scientists and analysts worldwide to manipulate and analyze their data. This book presents useful data manipulation techniques in pandas to perform complex data analysis in various domains.An update to our highly successful previous edition with new features, examples, updated code, and more, this book is an in-depth guide to get the most out of pandas for data analysis. Designed for both intermediate users as well as seasoned practitioners, you will learn advanced data manipulation techniques, such as multi-indexing, modifying data structures, and sampling your data, which allow for powerful analysis and help you gain accurate insights from it. With the help of this book, you will apply pandas to different domains, such as Bayesian statistics, predictive analytics, and time series analysis using an example-based approach. And not just that; you will also learn how to prepare powerful, interactive business reports in pandas using the Jupyter notebook.By the end of this book, you will learn how to perform efficient data analysis using pandas on complex data, and become an expert data analyst or data scientist in the process.
Simon R. Chapple, Terence Sloan, Thorsten Forster,...
R is one of the most popular programming languages used in data science. Applying R to big data and complex analytic tasks requires the harnessing of scalable compute resources.Mastering Parallel Programming with R presents a comprehensive and practical treatise on how to build highly scalable and efficient algorithms in R. It will teach you a variety of parallelization techniques, from simple use of R’s built-in parallel package versions of lapply(), to high-level AWS cloud-based Hadoop and Apache Spark frameworks. It will also teach you low level scalable parallel programming using RMPI and pbdMPI for message passing, applicable to clusters and supercomputers, and how to exploit thousand-fold simple processor GPUs through ROpenCL. By the end of the book, you will understand the factors that influence parallel efficiency, including assessing code performance and implementing load balancing; pitfalls to avoid, including deadlock and numerical instability issues; how to structure your code and data for the most appropriate type of parallelism for your problem domain; and how to extract the maximum performance from your R code running on a variety of computer systems.
Mastering PostGIS. Modern ways to create, analyze, and implement spatial data
Dominik Mikiewicz, Michal Mackiewicz, Tomasz Nycz
PostGIS is open source extension onf PostgreSQL object-relational database system that allows GIS objects to be stored and allows querying for information and location services. The aim of this book is to help you master the functionalities offered by PostGIS- from data creation, analysis and output, to ETL and live edits.The book begins with an overview of the key concepts related to spatial database systems and how it applies to Spatial RMDS. You will learn to load different formats into your Postgres instance, investigate the spatial nature of your raster data, and finally export it using built-in functionalities or 3th party tools for backup or representational purposes. Through the course of this book, you will be presented with many examples on how to interact with the database using JavaScript and Node.js. Sample web-based applications interacting with backend PostGIS will also be presented throughout the book, so you can get comfortable with the modern ways of consuming and modifying your spatial data.
Mastering PostgreSQL 9.6. A comprehensive guide for PostgreSQL 9.6 developers and administrators
Hans-Jürgen Schönig
PostgreSQL is an open source database used for handling large datasets (Big Data) and as a JSON document database. It also has applications in the software and web domains. This book will enable you to build better PostgreSQL applications and administer databases more efficiently.We begin by explaining the advanced database design concepts in PostgreSQL 9.6, along with indexing and query optimization. You will also see how to work with event triggers and perform concurrent transactions and table partitioning, along with exploring SQL and server tuning. We will walk you through implementing advanced administrative tasks such as server maintenance and monitoring, replication, recovery and high availability, and much more. You will understand the common and not-so-common troubleshooting problems and how you can overcome them.By the end of this book, you will have an expert-level command of the advanced database functionalities and will be able to implement advanced administrative tasks with PostgreSQL.
Mastering Predictive Analytics with Python. Click here to enter text
Joseph Babcock
The volume, diversity, and speed of data available has never been greater. Powerful machine learning methods can unlock the value in this information by finding complex relationships and unanticipated trends. Using the Python programming language, analysts can use these sophisticated methods to build scalable analytic applications to deliver insights that are of tremendous value to their organizations. In Mastering Predictive Analytics with Python, you will learn the process of turning raw data into powerful insights. Through case studies and code examples using popular open-source Python libraries, this book illustrates the complete development process for analytic applications and how to quickly apply these methods to your own data to create robust and scalable prediction services.Covering a wide range of algorithms for classification, regression, clustering, as well as cutting-edge techniques such as deep learning, this book illustrates not only how these methods work, but how to implement them in practice. You will learn to choose the right approach for your problem and how to develop engaging visualizations to bring the insights of predictive modeling to life
James D. Miller, Rui Miguel Forte
R offers a free and open source environment that is perfect for both learning and deploying predictive modeling solutions. With its constantly growing community and plethora of packages, R offers the functionality to deal with a truly vast array of problems.The book begins with a dedicated chapter on the language of models and the predictive modeling process. You will understand the learning curve and the process of tidying data. Each subsequent chapter tackles a particular type of model, such as neural networks, and focuses on the three important questions of how the model works, how to use R to train it, and how to measure and assess its performance using real-world datasets. How do you train models that can handle really large datasets? This book will also show you just that. Finally, you will tackle the really important topic of deep learning by implementing applications on word embedding and recurrent neural networks.By the end of this book, you will have explored and tested the most popular modeling techniques in use on real- world datasets and mastered a diverse range of techniques in predictive analytics using R.
Mastering Puppet 5. Optimize enterprise-grade environment performance with Puppet
Ryan Russell-Yates, Jason Southgate
Puppet is a configuration management system and a language written for and by system administrators to manage a large number of systems efficiently and prevent configuration drift.The core topics this book addresses are Puppet's latest features and mastering Puppet Enterprise. You will begin by writing a new Puppet module, gaining an understanding of the guidelines and style of the Puppet community. Following on from this, you will take advantage of the roles and profiles pattern, and you will learn how to structure your code. Next, you will learn how to extend Puppet and write custom facts, functions, types, and providers in Ruby, and also use the new features of Hiera 5. You will also learn how to configure the new Code Manager component, and how to ensure code is automatically deployed to (multiple) Puppet servers. Next, you will learn how to integrate Puppet with Jenkins and Git to build an effective workflow for multiple teams, and use the new Puppet Tasks feature and the latest Puppet Orchestrator language extensions. Finally, you will learn how to scale and troubleshoot Puppet.By the end of the book, you will be able to deal with problems of scale and exceptions in your code, automate workflows, and support multiple developers working simultaneously.
Luiz Felipe Martins, Gavin Hackeling, Magnus Vilhelm...
Python, a multi-paradigm programming language, has become the language of choice for data scientists for data analysis, visualization, and machine learning. Ever imagined how to become an expert at effectively approaching data analysis problems, solving them, and extracting all of the available information from your data? Well, look no further, this is the book you want!Through this comprehensive guide, you will explore data and present results and conclusions from statistical analysis in a meaningful way. You’ll be able to quickly and accurately perform the hands-on sorting, reduction, and subsequent analysis, and fully appreciate how data analysis methods can support business decision-making.You’ll start off by learning about the tools available for data analysis in Python and will then explore the statistical models that are used to identify patterns in data. Gradually, you’ll move on to review statistical inference using Python, Pandas, and SciPy. After that, we’ll focus on performing regression using computational tools and you’ll get to understand the problem of identifying clusters in data in an algorithmic way. Finally, we delve into advanced techniques to quantify cause and effect using Bayesian methods and you’ll discover how to use Python’s tools for supervised machine learning.
Kirthi Raman, Sebastian Cheung, Kirthi Venkatraman