Analiza danych
Analiza danych jest ekscytującą dyscypliną, która umożliwia zrozumienie pewnych zjawisk, uzyskanie wglądu i wiedzy na podstawie surowych danych. Pojęcie to oznacza dokładnie przetwarzanie danych za pomocą technik matematycznych i statystycznych w celu uzyskania cennych wniosków, podjęcia ważnych decyzji i opracowania przydatnych produktów. Termin ten wywodzi się od angielskiego data science, często traktowanego jako synonim takich terminów, jak analityka biznesowa, badania operacyjne, business intelligence, wywiad konkurencyjny, analiza i modelowanie danych, a także pozyskiwanie wiedzy. Dzięki takim technologiom, jak języki Python czy R, platformy Hadoop i Spark masz szansę wyciągnąć maksimum wniosków, dostrzec szanse na rozwój swojej organizacji albo przewidzieć i zapobiec zagrożeniom.
Modelowanie danych przy użyciu Microsoft Power BI
Markus Ehrenmueller-Jensen
Samoobsługa i hurtownia danych przedsiębiorstwa z użyciem Power BI Modelowanie danych to najczęściej pomijana funkcja w Power BI Desktop, ale to właśnie ona wyróżnia Power BI spośród innych narzędzi dostępnych na rynku. Ta praktyczna książka posłuży Ci jako przycisk szybkiego przewijania do przodu dla modelowania danych przy użyciu Power BI, modelu tabelarycznego usług Analysis Services i baz danych SQL. Służy ona jako punkt wyjścia do modelowania danych, a także pomaga odświeżyć wiedzę. Autor Markus Ehrenmueller-Jensen, założyciel Savory Data, przedstawia podstawowe koncepcje modelu semantycznego Power BI wraz z praktycznymi przykładami w językach DAX, Power Query i T-SQL. Nauczysz się: - Normalizować i denormalizować dane - Stosować najlepsze praktyki dla obliczeń, flag i wskaźników, daty i godziny, wymiarów wielokrotnego stosowania i wymiarów wolnozmiennych - Pokonywać trudności związane z binningiem, budżetem, modelami zlokalizowanymi, modelami złożonymi czy tabelami zawierającymi pary kluczy i wartości - Odkrywać i rozwiązywać problemy z wydajnością za pośrednictwem modelu danych - Pracować z tabelami, relacjami, operacjami na zbiorach, postaciami normalnymi, modelowaniem wymiarowym i procesem ETL Markus Ehrenmueller-Jensen, założyciel Savory Data, od 1994 r. pracuje jako lider projektów, trener i konsultant w obszarze inżynierii danych, analityki biznesowej i danologii. Jest inżynierem oprogramowania i profesorem w HTL Leonding (wyższa szkoła techniczna), gdzie uczy baz danych i inżynierii projektów. Posiada kilka certyfikatów Microsoft, a także tytuł Microsoft Data Platform MVP. "Ta książka to wyczerpujący samouczek omawiający temat w języku, który jest łatwy do zrozumienia, a przy tym jest dogłębny, zwięzły i dokładny. Doświadczenie Markusa w zakresie modelowania danych będzie stanowić wartość dla każdego profesjonalisty pracującego z danymi przy użyciu Power BI". -Paul Turley Microsoft Data Platform MVP
V Naresh Kumar, Prashant Shindgikar
The complex structure of data these days requires sophisticated solutions for data transformation, to make the information more accessible to the users.This book empowers you to build such solutions with relative ease with the help of Apache Hadoop, along with a host of other Big Data tools.This book will give you a complete understanding of the data lifecycle management with Hadoop, followed by modeling of structured and unstructured data in Hadoop. It will also show you how to design real-time streaming pipelines by leveraging tools such as Apache Spark, and build efficient enterprise search solutions using Elasticsearch. You will learn to build enterprise-grade analytics solutions on Hadoop, and how to visualize your data using tools such as Apache Superset. This book also covers techniques for deploying your Big Data solutions on the cloud Apache Ambari, as well as expert techniques for managing and administering your Hadoop cluster. By the end of this book, you will have all the knowledge you need to build expert Big Data systems.
Modern R Programming Cookbook. Recipes to simplify your statistical applications
Jaynal Abedin
R is a powerful tool for statistics, graphics, and statistical programming. It is used by tens of thousands of people daily to perform serious statistical analyses. It is a free, open source system whose implementation is the collective accomplishment of many intelligent, hard-working people. There are more than 2,000 available add-ons, and R is a serious rival to all commercial statistical packages. The objective of this book is to show how to work with different programming aspects of R. The emerging R developers and data science could have very good programming knowledge but might have limited understanding about R syntax and semantics. Our book will be a platform develop practical solution out of real world problem in scalable fashion and with very good understanding. You will work with various versions of R libraries that are essential for scalable data science solutions. You will learn to work with Input / Output issues when working with relatively larger dataset. At the end of this book readers will also learn how to work with databases from within R and also what and how meta programming helps in developing applications.
Manu Joseph
We live in a serendipitous era where the explosion in the quantum of data collected and a renewed interest in data-driven techniques such as machine learning (ML), has changed the landscape of analytics, and with it, time series forecasting. This book, filled with industry-tested tips and tricks, takes you beyond commonly used classical statistical methods such as ARIMA and introduces to you the latest techniques from the world of ML.This is a comprehensive guide to analyzing, visualizing, and creating state-of-the-art forecasting systems, complete with common topics such as ML and deep learning (DL) as well as rarely touched-upon topics such as global forecasting models, cross-validation strategies, and forecast metrics. You’ll begin by exploring the basics of data handling, data visualization, and classical statistical methods before moving on to ML and DL models for time series forecasting. This book takes you on a hands-on journey in which you’ll develop state-of-the-art ML (linear regression to gradient-boosted trees) and DL (feed-forward neural networks, LSTMs, and transformers) models on a real-world dataset along with exploring practical topics such as interpretability.By the end of this book, you’ll be able to build world-class time series forecasting systems and tackle problems in the real world.
Doug Bierer
MongoDB has grown to become the de facto NoSQL database with millions of users, from small start-ups to Fortune 500 companies. It can solve problems that are considered difficult, if not impossible, for aging RDBMS technologies. Written for version 4 of MongoDB, this book is the easiest way to get started with MongoDB.You will start by getting a MongoDB installation up and running in a safe and secure manner. You will learn how to perform mission-critical create, read, update, and delete operations, and set up database security. You will also learn about advanced features of MongoDB such as the aggregation pipeline, replication, and sharding. You will learn how to build a simple web application that uses MongoDB to respond to AJAX queries, and see how to make use of the MongoDB programming language driver for PHP. The examples incorporate new features available in MongoDB version 4 where appropriate.
Cyrus Dasadia
MongoDB is a high-performance and feature-rich NoSQL database that forms the backbone of the systems that power many different organizations. Packed with many features that have become essential for many different types of software professional and incredibly easy to use, this cookbook contains more than 100 recipes to address the everyday challenges of working with MongoDB.Starting with database configuration, you will understand the indexing aspects of MongoDB. The book also includes practical recipes on how you can optimize your database query performance, perform diagnostics, and query debugging. You will also learn how to implement the core administration tasks required for high-availability and scalability, achieved through replica sets and sharding, respectively. You will also implement server security concepts such as authentication, user management, role-based access models, and TLS configuration. You will also learn how to back up and recover your database efficiently and monitor server performance.By the end of this book, you will have all the information you need—along with tips, tricks, and best practices—to implement a high-performance MongoDB solution.
Russ McKendrick
This book will show you how monitoring containers and keeping a keen eye on the working of applications helps improve the overall performance of the applications that run on Docker. With the increased adoption of Docker containers, the need to monitor which containers are running, what resources they are consuming, and how these factors affect the overall performance of the system has become the need of the moment.This book covers monitoring containers using Docker's native monitoring functions, various plugins, as well as third-party tools that help in monitoring. Well start with how to obtain detailed stats for active containers, resources consumed, and container behavior. We also show you how to use these stats to improve the overall performance of the system. Next, you will learn how to use SysDig to both view your containers performance metrics in real time and record sessions to query later. By the end of this book, you will have a complete knowledge of how to implement monitoring for your containerized applications and make the most of the metrics you are collecting