Big data
Ayodele Oluleye
In today's data-centric world, the ability to extract meaningful insights from vast amounts of data has become a valuable skill across industries. Exploratory Data Analysis (EDA) lies at the heart of this process, enabling us to comprehend, visualize, and derive valuable insights from various forms of data.This book is a comprehensive guide to Exploratory Data Analysis using the Python programming language. It provides practical steps needed to effectively explore, analyze, and visualize structured and unstructured data. It offers hands-on guidance and code for concepts such as generating summary statistics, analyzing single and multiple variables, visualizing data, analyzing text data, handling outliers, handling missing values and automating the EDA process. It is suited for data scientists, data analysts, researchers or curious learners looking to gain essential knowledge and practical steps for analyzing vast amounts of data to uncover insights.Python is an open-source general purpose programming language which is used widely for data science and data analysis given its simplicity and versatility. It offers several libraries which can be used to clean, analyze, and visualize data. In this book, we will explore popular Python libraries such as Pandas, Matplotlib, and Seaborn and provide workable code for analyzing data in Python using these libraries.By the end of this book, you will have gained comprehensive knowledge about EDA and mastered the powerful set of EDA techniques and tools required for analyzing both structured and unstructured data to derive valuable insights.
Exploring GPT-3. An unofficial first look at the general-purpose language processing API from OpenAI
Steve Tingiris
Generative Pre-trained Transformer 3 (GPT-3) is a highly advanced language model from OpenAI that can generate written text that is virtually indistinguishable from text written by humans. Whether you have a technical or non-technical background, this book will help you understand and start working with GPT-3 and the OpenAI API.If you want to get hands-on with leveraging artificial intelligence for natural language processing (NLP) tasks, this easy-to-follow book will help you get started. Beginning with a high-level introduction to NLP and GPT-3, the book takes you through practical examples that show how to leverage the OpenAI API and GPT-3 for text generation, classification, and semantic search. You'll explore the capabilities of the OpenAI API and GPT-3 and find out which NLP use cases GPT-3 is best suited for. You’ll also learn how to use the API and optimize requests for the best possible results. With examples focusing on the OpenAI Playground and easy-to-follow JavaScript and Python code samples, the book illustrates the possible applications of GPT-3 in production.By the end of this book, you'll understand the best use cases for GPT-3 and how to integrate the OpenAI API in your applications for a wide array of NLP tasks.
Steven Sanderson, David Kun
– Extending Excel with Python and R is a game changer resource written by experts Steven Sanderson, the author of the healthyverse suite of R packages, and David Kun, co-founder of Functional Analytics. – This comprehensive guide transforms the way you work with spreadsheet-based data by integrating Python and R with Excel to automate tasks, execute statistical analysis, and create powerful visualizations. – Working through the chapters, you’ll find out how to perform exploratory data analysis, time series analysis, and even integrate APIs for maximum efficiency. – Both beginners and experts will get everything you need to unlock Excel's full potential and take your data analysis skills to the next level. – By the end of this book, you’ll be able to import data from Excel, manipulate it in R or Python, and perform the data analysis tasks in your preferred framework while pushing the results back to Excel for sharing with others as needed.
Luca Zavarella
Python and R allow you to extend Power BI capabilities to simplify ingestion and transformation activities, enhance dashboards, and highlight insights. With this book, you'll be able to make your artifacts far more interesting and rich in insights using analytical languages.You'll start by learning how to configure your Power BI environment to use your Python and R scripts. The book then explores data ingestion and data transformation extensions, and advances to focus on data augmentation and data visualization. You'll understand how to import data from external sources and transform them using complex algorithms. The book helps you implement personal data de-identification methods such as pseudonymization, anonymization, and masking in Power BI. You'll be able to call external APIs to enrich your data much more quickly using Python programming and R programming. Later, you'll learn advanced Python and R techniques to perform in-depth analysis and extract valuable information using statistics and machine learning. You'll also understand the main statistical features of datasets by plotting multiple visual graphs in the process of creating a machine learning model.By the end of this book, you’ll be able to enrich your Power BI data models and visualizations using complex algorithms in Python and R.
Extreme C. Taking you to the limit in Concurrency, OOP, and the most advanced capabilities of C
Kamran Amini
There’s a lot more to C than knowing the language syntax. The industry looks for developers with a rigorous, scientific understanding of the principles and practices. Extreme C will teach you to use C’s advanced low-level power to write effective, efficient systems. This intensive, practical guide will help you become an expert C programmer.Building on your existing C knowledge, you will master preprocessor directives, macros, conditional compilation, pointers, and much more. You will gain new insight into algorithm design, functions, and structures. You will discover how C helps you squeeze maximum performance out of critical, resource-constrained applications.C still plays a critical role in 21st-century programming, remaining the core language for precision engineering, aviations, space research, and more. This book shows how C works with Unix, how to implement OO principles in C, and fully covers multi-processing.In Extreme C, Amini encourages you to think, question, apply, and experiment for yourself. The book is essential for anybody who wants to take their C to the next level.
Extreme DAX. Take your Power BI and Microsoft data analytics skills to the next level
Michiel Rozema, Henk Vlootman
This book helps business analysts generate powerful and sophisticated analyses from their data using DAX and get the most out of Microsoft Business Intelligence tools.Extreme DAX will first teach you the principles of business intelligence, good model design, and how DAX fits into it all. Then, you’ll launch into detailed examples of DAX in real-world business scenarios such as inventory calculations, forecasting, intercompany business, and data security. At each step, senior DAX experts will walk you through the subtleties involved in working with Power BI models and common mistakes to look out for as you build advanced data aggregations. You’ll deepen your understanding of DAX functions, filters, and measures, and how and when they can be used to derive effective insights. You’ll also be provided with PBIX files for each chapter, so that you can follow along and explore in your own time.
Raúl Estrada
SMACK is an open source full stack for big data architecture. It is a combination of Spark, Mesos, Akka, Cassandra, and Kafka. This stack is the newest technique developers have begun to use to tackle critical real-time analytics for big data. This highly practical guide will teach you how to integrate these technologies to create a highly efficient data analysis system for fast data processing.We’ll start off with an introduction to SMACK and show you when to use it. First you’ll get to grips with functional thinking and problem solving using Scala. Next you’ll come to understand the Akka architecture. Then you’ll get to know how to improve the data structure architecture and optimize resources using Apache Spark. Moving forward, you’ll learn how to perform linear scalability in databases with Apache Cassandra. You’ll grasp the high throughput distributed messaging systems using Apache Kafka. We’ll show you how to build a cheap but effective cluster infrastructure with Apache Mesos. Finally, you will deep dive into the different aspect of SMACK using a few case studies. By the end of the book, you will be able to integrate all the components of the SMACK stack and use them together to achieve highly effective and fast data processing.
Fast Data Processing with Spark 2. Accelerate your data for rapid insight - Third Edition
Krishna Sankar , Holden Karau
When people want a way to process big data at speed, Spark is invariably the solution. With its ease of development (in comparison to the relative complexity of Hadoop), it’s unsurprising that it’s becoming popular with data analysts and engineers everywhere. Beginning with the fundamentals, we’ll show you how to get set up with Spark with minimum fuss. You’ll then get to grips with some simple APIs before investigating machine learning and graph processing – throughout we’ll make sure you know exactly how to apply your knowledge. You will also learn how to use the Spark shell, how to load data before finding out how to build and run your own Spark applications. Discover how to manipulate your RDD and get stuck into a range of DataFrame APIs. As if that’s not enough, you’ll also learn some useful Machine Learning algorithms with the help of Spark MLlib and integrating Spark with R. We’ll also make sure you’re confident and prepared for graph processing, as you learn more about the GraphX API.
Joydeep Bhattacharjee
Facebook's fastText library handles text representation and classification, used for Natural Language Processing (NLP). Most organizations have to deal with enormous amounts of text data on a daily basis, and gaining efficient data insights requires powerful NLP tools such as fastText. This book is your ideal introduction to fastText. You will learn how to create fastText models from the command line, without the need for complicated code. You will explore the algorithms that fastText is built on and how to use them for word representation and text classification. Next, you will use fastText in conjunction with other popular libraries and frameworks such as Keras, TensorFlow, and PyTorch. Finally, you will deploy fastText models to mobile devices. By the end of this book, you will have all the required knowledge to use fastText in your own applications at work or in projects.
Sinan Ozdemir, Divya Susarla, Michael Smith
Feature engineering is the most important step in creating powerful machine learning systems. This book will take you through the entire feature-engineering journey to make your machine learning much more systematic and effective.You will start with understanding your data—often the success of your ML models depends on how you leverage different feature types, such as continuous, categorical, and more, You will learn when to include a feature, when to omit it, and why, all by understanding error analysis and the acceptability of your models. You will learn to convert a problem statement into useful new features. You will learn to deliver features driven by business needs as well as mathematical insights. You'll also learn how to use machine learning on your machines, automatically learning amazing features for your data.By the end of the book, you will become proficient in Feature Selection, Feature Learning, and Feature Optimization.
Feature Store for Machine Learning. Curate, discover, share and serve ML features at scale
Jayanth Kumar M J
Feature store is one of the storage layers in machine learning (ML) operations, where data scientists and ML engineers can store transformed and curated features for ML models. This makes them available for model training, inference (batch and online), and reuse in other ML pipelines. Knowing how to utilize feature stores to their fullest potential can save you a lot of time and effort, and this book will teach you everything you need to know to get started.Feature Store for Machine Learning is for data scientists who want to learn how to use feature stores to share and reuse each other's work and expertise. You’ll be able to implement practices that help in eliminating reprocessing of data, providing model-reproducible capabilities, and reducing duplication of work, thus improving the time to production of the ML model. While this ML book offers some theoretical groundwork for developers who are just getting to grips with feature stores, there's plenty of practical know-how for those ready to put their knowledge to work. With a hands-on approach to implementation and associated methodologies, you'll get up and running in no time.By the end of this book, you’ll have understood why feature stores are essential and how to use them in your ML projects, both on your local system and on the cloud.
MrExcel's Holy Macro! Books, Liam Bastick, Oscar...
This book is a practical guide for mastering financial modeling in project finance, providing a clear journey from foundational concepts to advanced techniques. It begins by introducing project finance, its significance, and how it differs from other finance structures. Readers will learn key Excel functions, data validation, and layout strategies essential for creating accurate and dynamic models.As the journey progresses, the book emphasizes best practices for building transparent, flexible, and robust models. It covers linked financial statements, cash flow waterfalls, debt structuring, and valuation techniques. A comprehensive case study walks readers through the construction of a full project finance model, separating construction and operational phases while integrating advanced concepts like scenario planning, sensitivity analysis, and ratio metrics.Designed with a logical flow, this book equips readers with practical skills to tackle real-world financial challenges. From Excel tips to project valuation and funding strategies, it provides actionable insights for analysts, finance professionals, and project managers seeking to excel in project finance modeling.
Financial Modelling in Power BI. Master Subtotals, Functions, and Advanced Excel Tricks in Minutes!
MrExcel's Holy Macro! Books, Jonathan Liau, Liam...
This book introduces readers to the fundamentals of financial modeling using Power BI, starting with an overview of the tool and best practices for creating robust, transparent, and flexible models. Early chapters lay the groundwork by explaining financial statement theory and control accounts, essential concepts for any financial analyst. Readers are guided step-by-step through creating parameters and calculating sales, ensuring a solid foundation in Power BI's core functionalities.As the book progresses, readers delve into more advanced topics such as inventory calculations, operating and capital expenditures, and tax computations. Practical examples and hands-on exercises make complex concepts like DAX functions, FIFO inventory modeling, and control account measures accessible to users of all experience levels. Detailed sections on cash flow statements, income statements, and balance sheets tie the lessons together, showing how these elements integrate into a comprehensive financial model.The final chapters explore advanced features like interest and debt modeling, recursion aversion, and equity calculations, culminating in the creation of fully dynamic and optimized models. Readers also learn to design compelling visualizations to present financial insights effectively. By the end of the journey, users will have the tools and confidence to apply their knowledge to real-world scenarios, mastering financial modeling with Power BI.
Financial Modelling using Dynamic Arrays. Let Lambdas Extend Your Range
MrExcel's Holy Macro! Books, Liam Bastick
Dive into the transformative power of Excel's dynamic arrays in financial modelling. Learn to optimize formulas with LET, create reusable LAMBDA functions, and craft sophisticated models. The book provides a comprehensive introduction to Excel’s dynamic arrays, comparing legacy methodologies with modern capabilities while integrating practical tips and best practices.Through real-world examples and step-by-step tutorials, you’ll uncover the full potential of functions like SORT, FILTER, SEQUENCE, and LAMBDA. Discover how dynamic arrays reduce errors, boost efficiency, and enable innovative approaches to financial modelling. The book also highlights advanced features like eta lambdas and helper functions, offering a deep dive into the cutting-edge tools now available in Excel 365.Whether you’re building complex financial models or just looking to refine your techniques, this guide equips you with the knowledge to transform your processes. Excel enthusiasts and professionals alike will appreciate the clarity and depth this book provides, helping you elevate your modelling game to a whole new level.
Frank Kane
Frank Kane’s Taming Big Data with Apache Spark and Python is your companion to learning Apache Spark in a hands-on manner. Frank will start you off by teaching you how to set up Spark on a single system or on a cluster, and you’ll soon move on to analyzing large data sets using Spark RDD, and developing and running effective Spark jobs quickly using Python.Apache Spark has emerged as the next big thing in the Big Data domain – quickly rising from an ascending technology to an established superstar in just a matter of years. Spark allows you to quickly extract actionable insights from large amounts of data, on a real-time basis, making it an essential tool in many modern businesses. Frank has packed this book with over 15 interactive, fun-filled examples relevant to the real world, and he will empower you to understand the Spark ecosystem and implement production-grade real-time Spark projects with ease.
Fundamentals of Analytics Engineering. An introduction to building end-to-end analytics solutions
Dumky De Wilde, Fanny Kassapian, Jovan Gligorevic,...
Written by a team of 7 industry experts, Fundamentals of Analytics Engineering will introduce you to everything from foundational concepts to advanced skills to get started as an analytics engineer.After conquering data ingestion and techniques for data quality and scalability, you’ll learn about techniques such as data cleaning transformation, data modeling, SQL query optimization and reuse, and serving data across different platforms. Armed with this knowledge, you will implement a simple data platform from ingestion to visualization, using tools like Airbyte Cloud, Google BigQuery, dbt, and Tableau. You’ll also get to grips with strategies for data integrity with a focus on data quality and observability, along with collaborative coding practices like version control with Git. You’ll learn about advanced principles like CI/CD, automating workflows, gathering, scoping, and documenting business requirements, as well as data governance.By the end of this book, you’ll be armed with the essential techniques and best practices for developing scalable analytics solutions from end to end.