Analiza danych
Data Wrangling on AWS. Clean and organize complex data for analysis
Navnit Shukla, Sankar M, Sam Palani
Data wrangling is the process of cleaning, transforming, and organizing raw, messy, or unstructured data into a structured format. It involves processes such as data cleaning, data integration, data transformation, and data enrichment to ensure that the data is accurate, consistent, and suitable for analysis. Data Wrangling on AWS equips you with the knowledge to reap the full potential of AWS data wrangling tools.First, you’ll be introduced to data wrangling on AWS and will be familiarized with data wrangling services available in AWS. You’ll understand how to work with AWS Glue DataBrew, AWS data wrangler, and AWS Sagemaker. Next, you’ll discover other AWS services like Amazon S3, Redshift, Athena, and Quicksight. Additionally, you’ll explore advanced topics such as performing Pandas data operation with AWS data wrangler, optimizing ML data with AWS SageMaker, building the data warehouse with Glue DataBrew, along with security and monitoring aspects.By the end of this book, you’ll be well-equipped to perform data wrangling using AWS services.
Mercury Learning and Information, Oswald Campesato
This book is designed for aspiring data scientists and those involved in data cleaning. It covers features of NumPy and Pandas, along with creating databases and tables in MySQL. It also addresses various data wrangling tasks using Python scripts and awk-based shell scripts. Companion files with code are available from the publisher.Understanding data cleaning and manipulation is vital for data scientists. This book provides a comprehensive introduction to essential tools and techniques. From Python basics to advanced data wrangling, it equips readers with the skills needed to manage and clean data effectively.The journey begins with an introduction to Python and progresses through working with data, Pandas, and SQL. It also covers Java, JSON, XML, and specific data cleaning tasks. The book culminates with detailed data wrangling techniques, ensuring readers gain practical, hands-on experience in data management.
Data Wrangling with Python. Creating actionable data from raw sources
Dr. Tirthajyoti Sarkar , Shubhadeep Roychowdhury
For data to be useful and meaningful, it must be curated and refined. Data Wrangling with Python teaches you the core ideas behind these processes and equips you with knowledge of the most popular tools and techniques in the domain.The book starts with the absolute basics of Python, focusing mainly on data structures. It then delves into the fundamental tools of data wrangling like NumPy and Pandas libraries. You'll explore useful insights into why you should stay away from traditional ways of data cleaning, as done in other languages, and take advantage of the specialized pre-built routines in Python. This combination of Python tips and tricks will also demonstrate how to use the same Python backend and extract/transform data from an array of sources including the Internet, large database vaults, and Excel financial tables. To help you prepare for more challenging scenarios, you'll cover how to handle missing or wrong data, and reformat it based on the requirements from the downstream analytics tool. The book will further help you grasp concepts through real-world examples and datasets.By the end of this book, you will be confident in using a diverse array of sources to extract, clean, transform, and format your data efficiently.
Gustavo Santos
In this information era, where large volumes of data are being generated every day, companies want to get a better grip on it to perform more efficiently than before. This is where skillful data analysts and data scientists come into play, wrangling and exploring data to generate valuable business insights. In order to do that, you’ll need plenty of tools that enable you to extract the most useful knowledge from data.Data Wrangling with R will help you to gain a deep understanding of ways to wrangle and prepare datasets for exploration, analysis, and modeling. This data book enables you to get your data ready for more optimized analyses, develop your first data model, and perform effective data visualization.The book begins by teaching you how to load and explore datasets. Then, you’ll get to grips with the modern concepts and tools of data wrangling. As data wrangling and visualization are intrinsically connected, you’ll go over best practices to plot data and extract insights from it. The chapters are designed in a way to help you learn all about modeling, as you will go through the construction of a data science project from end to end, and become familiar with the built-in RStudio, including an application built with Shiny dashboards.By the end of this book, you’ll have learned how to create your first data model and build an application with Shiny in R.
Data Wrangling with SQL. A hands-on guide to manipulating, wrangling, and engineering data using SQL
Raghav Kandarpa, Shivangi Saxena
The amount of data generated continues to grow rapidly, making it increasingly important for businesses to be able to wrangle this data and understand it quickly and efficiently. Although data wrangling can be challenging, with the right tools and techniques you can efficiently handle enormous amounts of unstructured data.The book starts by introducing you to the basics of SQL, focusing on the core principles and techniques of data wrangling. You’ll then explore advanced SQL concepts like aggregate functions, window functions, CTEs, and subqueries that are very popular in the business world. The next set of chapters will walk you through different functions within SQL query that cause delays in data transformation and help you figure out the difference between a good query and bad one. You’ll also learn how data wrangling and data science go hand in hand. The book is filled with datasets and practical examples to help you understand the concepts thoroughly, along with best practices to guide you at every stage of data wrangling.By the end of this book, you’ll be equipped with essential techniques and best practices for data wrangling, and will predominantly learn how to use clean and standardized data models to make informed decisions, helping businesses avoid costly mistakes.
Jonas Christensen, Nakul Bajaj, Manmohan Gosada, Kirk...
In the rapidly advancing data-driven world where data quality is pivotal to the success of machine learning and artificial intelligence projects, this critically timed guide provides a rare, end-to-end overview of data-centric machine learning (DCML), along with hands-on applications of technical and non-technical approaches to generating deeper and more accurate datasets.This book will help you understand what data-centric ML/AI is and how it can help you to realize the potential of ‘small data’. Delving into the building blocks of data-centric ML/AI, you’ll explore the human aspects of data labeling, tackle ambiguity in labeling, and understand the role of synthetic data. From strategies to improve data collection to techniques for refining and augmenting datasets, you’ll learn everything you need to elevate your data-centric practices. Through applied examples and insights for overcoming challenges, you’ll get a roadmap for implementing data-centric ML/AI in diverse applications in Python.By the end of this book, you’ll have developed a profound understanding of data-centric ML/AI and the proficiency to seamlessly integrate common data-centric approaches in the model development lifecycle to unlock the full potential of your machine learning projects by prioritizing data quality and reliability.
DAX i Power BI w analizie danych. Tworzenie zaawansowanych i efektywnych analiz dla biznesu
Michiel Rozema, Henk Vlootman
DAX i Power BI w analizie danych. Tworzenie zaawansowanych i efektywnych analiz dla biznesu Microsoft Power BI jest doskonałym narzędziem do profesjonalnej analizy danych. Jeśli jednak chcesz uzyskać za jego pomocą naprawdę spektakularne efekty, musisz się biegle posługiwać językiem DAX (Data Analysis Expressions). Pozwala on na wykonywanie zaawansowanych obliczeń i zapytań dotyczących danych w powiązanych tabelach i kolumnach w tabelarycznych modelach danych. To książka przeznaczona dla analityków biznesowych, którzy już poznali język DAX, chcą jednak skorzystać z pełnego potencjału formuł tego języka i modeli Power BI, by tworzyć wydajne i zaawansowane analizy danych. Opisano w niej zasady analizy biznesowej i reguły projektowania dobrych modeli. Zaprezentowano też praktyczne przykłady użycia języka DAX w rzeczywistych sytuacjach biznesowych. Pokazano niuanse pracy z modelami Power BI, a także z funkcjami DAX, filtrami i miarami. Nie zabrakło bardzo przydatnych wskazówek dotyczących błędów popełnianych często podczas tworzenia zaawansowanych agregacji danych. Do książki zostały dołączone materiały do pobrania (pliki PBIX), które ułatwią pełne zrozumienie prezentowanych treści i ich stosowanie we własnej praktyce zawodowej. Najciekawsze zagadnienia: koncepcje modelowania danych i struktur modele Power BI a modele systemów zarządzania relacyjnymi bazami danych bezpieczne poziomy agregacji, atrybuty i hierarchie koncepcja kontekstu i jej stosowanie standardowa analiza czasowa inteligentna ocena inwestycji za pomocą finansowych funkcji DAX Poznaj prawdziwy potencjał języka DAX w analizie danych!
Cuantum Technologies LLC
Dive into the world of deep learning with this comprehensive guide that bridges theory and practice. From foundational neural networks to advanced architectures like CNNs, RNNs, and Transformers, this book equips you with the tools to build, train, and optimize AI models using TensorFlow, Keras, and PyTorch. Clear explanations of key concepts such as gradient descent, loss functions, and backpropagation are combined with hands-on exercises to ensure practical understanding.Explore cutting-edge AI frameworks, including generative adversarial networks (GANs) and autoencoders, while mastering real-world applications like image classification, text generation, and natural language processing. Detailed chapters cover transfer learning, fine-tuning pretrained models, and deployment strategies for cloud and edge computing. Practical exercises and projects further solidify your skills as you implement AI solutions for diverse challenges.Whether you're deploying AI models on cloud platforms like AWS or optimizing them for edge devices with TensorFlow Lite, this book provides step-by-step guidance. Designed for developers, AI enthusiasts, and data scientists, it balances theoretical depth with actionable insights, making it the ultimate resource for mastering modern deep learning frameworks and advancing your career in AI