Big data

121
Loading...
EBOOK

Cleaning Data for Effective Data Science. Doing the other 80% of the work with Python, R, and command-line tools

David Mertz

Data cleaning is the all-important first step to successful data science, data analysis, and machine learning. If you work with any kind of data, this book is your go-to resource, arming you with the insights and heuristics experienced data scientists had to learn the hard way.In a light-hearted and engaging exploration of different tools, techniques, and datasets real and fictitious, Python veteran David Mertz teaches you the ins and outs of data preparation and the essential questions you should be asking of every piece of data you work with.Using a mixture of Python, R, and common command-line tools, Cleaning Data for Effective Data Science follows the data cleaning pipeline from start to end, focusing on helping you understand the principles underlying each step of the process. You'll look at data ingestion of a vast range of tabular, hierarchical, and other data formats, impute missing values, detect unreliable data and statistical anomalies, and generate synthetic features. The long-form exercises at the end of each chapter let you get hands-on with the skills you've acquired along the way, also providing a valuable resource for academic courses.

122
Loading...
EBOOK

Cleaning Excel Data With Power Query Straight to the Point. Efficient Data Cleaning Techniques in Excel Using Power Query

MrExcel's Holy Macro! Books, Oz du Soleil

This book provides a step-by-step guide to using Power Query in Excel for efficient data cleaning and transformation. Starting with an introduction to its capabilities, it explains how to import data, handle missing values, and parse text fields with ease.Advanced techniques such as merging datasets, appending data, and performing joins are explored in detail. The book also covers grouping data, creating conditional and custom columns, and reshaping data through unpivoting for analysis. Each concept is illustrated with practical examples for clarity.By the end of the book, readers will be equipped with the skills to automate repetitive tasks and streamline workflows. Whether dealing with messy data or preparing datasets for analysis, this guide ensures you can confidently tackle any Excel data transformation challenge.

123
Loading...
EBOOK

Clojure Data Analysis Cookbook - Second Edition. Dive into data analysis with Clojure through over 100 practical recipes for every stage of the analysis and collection process - Second Edition

Eric Richard Rochester

This book is for those with a basic knowledge of Clojure, who are looking to push the language to excel with data analysis.

124
Loading...
EBOOK

Clojure for Data Science. Statistics, big data, and machine learning for Clojure programmers

Henry Garner

The term “data science” has been widely used to define this new profession that is expected to interpret vast datasets and translate them to improved decision-making and performance. Clojure is a powerful language that combines the interactivity of a scripting language with the speed of a compiled language. Together with its rich ecosystem of native libraries and an extremely simple and consistent functional approach to data manipulation, which maps closely to mathematical formula, it is an ideal, practical, and flexible language to meet a data scientist’s diverse needs.Taking you on a journey from simple summary statistics to sophisticated machine learning algorithms, this book shows how the Clojure programming language can be used to derive insights from data. Data scientists often forge a novel path, and you’ll see how to make use of Clojure’s Java interoperability capabilities to access libraries such as Mahout and Mllib for which Clojure wrappers don’t yet exist. Even seasoned Clojure developers will develop a deeper appreciation for their language’s flexibility!You’ll learn how to apply statistical thinking to your own data and use Clojure to explore, analyze, and visualize it in a technically and statistically robust way. You can also use Incanter for local data processing and ClojureScript to present interactive visualisations and understand how distributed platforms such as Hadoop sand Spark’s MapReduce and GraphX’s BSP solve the challenges of data analysis at scale, and how to explain algorithms using those programming models.Above all, by following the explanations in this book, you’ll learn not just how to be effective using the current state-of-the-art methods in data science, but why such methods work so that you can continue to be productive as the field evolves into the future.

125
Loading...
EBOOK

Cloud Analytics with Google Cloud Platform. An end-to-end guide to processing and analyzing big data using Google Cloud Platform

Sanket Thodge

With the ongoing data explosion, more and more organizations all over the world are slowly migrating their infrastructure to the cloud. These cloud platforms also provide their distinct analytics services to help you get faster insights from your data. This book will give you an introduction to the concept of analytics on the cloud, and the different cloud services popularly used for processing and analyzing data. If you’re planning to adopt the cloud analytics model for your business, this book will help you understand the design and business considerations to be kept in mind, and choose the best tools and alternatives for analytics, based on your requirements. The chapters in this book will take you through the 70+ services available in Google Cloud Platform and their implementation for practical purposes. From ingestion to processing your data, this book contains best practices on building an end-to-end analytics pipeline on the cloud by leveraging popular concepts such as machine learning and deep learning.By the end of this book, you will have a better understanding of cloud analytics as a concept as well as a practical know-how of its implementation

126
Loading...
EBOOK

Codeless Deep Learning with KNIME. Build, train, and deploy various deep neural network architectures using KNIME Analytics Platform

Kathrin Melcher, Rosaria Silipo

KNIME Analytics Platform is an open source software used to create and design data science workflows. This book is a comprehensive guide to the KNIME GUI and KNIME deep learning integration, helping you build neural network models without writing any code. It’ll guide you in building simple and complex neural networks through practical and creative solutions for solving real-world data problems.Starting with an introduction to KNIME Analytics Platform, you’ll get an overview of simple feed-forward networks for solving simple classification problems on relatively small datasets. You’ll then move on to build, train, test, and deploy more complex networks, such as autoencoders, recurrent neural networks (RNNs), long short-term memory (LSTM), and convolutional neural networks (CNNs). In each chapter, depending on the network and use case, you’ll learn how to prepare data, encode incoming data, and apply best practices.By the end of this book, you’ll have learned how to design a variety of different neural architectures and will be able to train, test, and deploy the final network.

127
Loading...
EBOOK

Codeless Deep Learning with KNIME. Build, train, and deploy various deep neural network architectures using KNIME Analytics Platform

Kathrin Melcher, Rosaria Silipo

KNIME Analytics Platform is an open source software used to create and design data science workflows. This book is a comprehensive guide to the KNIME GUI and KNIME deep learning integration, helping you build neural network models without writing any code. It’ll guide you in building simple and complex neural networks through practical and creative solutions for solving real-world data problems.Starting with an introduction to KNIME Analytics Platform, you’ll get an overview of simple feed-forward networks for solving simple classification problems on relatively small datasets. You’ll then move on to build, train, test, and deploy more complex networks, such as autoencoders, recurrent neural networks (RNNs), long short-term memory (LSTM), and convolutional neural networks (CNNs). In each chapter, depending on the network and use case, you’ll learn how to prepare data, encode incoming data, and apply best practices.By the end of this book, you’ll have learned how to design a variety of different neural architectures and will be able to train, test, and deploy the final network.

128
Loading...
EBOOK

CompTIA Data+: DAO-001 Certification Guide. Complete coverage of the new CompTIA Data+ (DAO-001) exam to help you pass on the first attempt

Cameron Dodd

The CompTIA Data+ certification exam not only helps validate a skill set required to enter one of the fastest-growing fields in the world, but also is starting to standardize the language and concepts within the field. However, there’s a lot of conflicting information and a lack of existing resources about the topics covered in this exam, and even professionals working in data analytics may need a study guide to help them pass on their first attempt.The CompTIA Data + (DAO-001) Certification Guide will give you a solid understanding of how to prepare, analyze, and report data for better insights.You’ll get an introduction to Data+ certification exam format to begin with, and then quickly dive into preparing data. You'll learn about collecting, cleaning, and processing data along with data wrangling and manipulation. As you progress, you’ll cover data analysis topics such as types of analysis, common techniques, hypothesis techniques, and statistical analysis, before tackling data reporting, common visualizations, and data governance. All the knowledge you've gained throughout the book will be tested with the mock tests that appear in the final chapters.By the end of this book, you’ll be ready to pass the Data+ exam with confidence and take the next step in your career.

129
Loading...
EBOOK

CompTIA Project+ Certification Guide. Learn project management best practices and successfully pass the CompTIA Project+ PK0-004 exam

J. Ashley Hunt

The CompTIA Project+ exam is designed for IT professionals who want to improve their career trajectory by gaining certification in project management specific to their industry. This guide covers everything necessary to pass the current iteration of the Project+ PK0-004 exam.The CompTIA Project+ Certification Guide starts by covering project initiation best practices, including an understanding of organizational structures, team roles, and responsibilities. You’ll then study best practices for developing a project charter and the scope of work to produce deliverables necessary to obtain formal approval of the end result. The ability to monitor your project work and make changes as necessary to bring performance back in line with the plan is the difference between a successful and unsuccessful project. The concluding chapters of the book provide best practices to help keep an eye on your projects and close them out successfully. The guide also includes practice questions created to mirror the exam experience and help solidify your understanding of core project management concepts.By the end of this book, you will be able to develop creative solutions for complex issues faced in project management.