Analiza danych
Analiza danych jest ekscytującą dyscypliną, która umożliwia zrozumienie pewnych zjawisk, uzyskanie wglądu i wiedzy na podstawie surowych danych. Pojęcie to oznacza dokładnie przetwarzanie danych za pomocą technik matematycznych i statystycznych w celu uzyskania cennych wniosków, podjęcia ważnych decyzji i opracowania przydatnych produktów. Termin ten wywodzi się od angielskiego data science, często traktowanego jako synonim takich terminów, jak analityka biznesowa, badania operacyjne, business intelligence, wywiad konkurencyjny, analiza i modelowanie danych, a także pozyskiwanie wiedzy. Dzięki takim technologiom, jak języki Python czy R, platformy Hadoop i Spark masz szansę wyciągnąć maksimum wniosków, dostrzec szanse na rozwój swojej organizacji albo przewidzieć i zapobiec zagrożeniom.
Analiza i prezentacja danych w Microsoft Excel. Vademecum Walkenbacha. Wydanie II
John Walkenbach, Michael Alexander
Wykorzystaj możliwości Excela w zarządzaniu! Jeżeli masz przed sobą setki, a może tysiące lub miliony danych, z których chcesz wyciągnąć celne wnioski, potrzebujesz narzędzia, które pomoże Ci to ogarnąć. Mowa oczywiście o Excelu. Nieważne, kim jesteś - studentem, księgowym, menedżerem czy prezesem - na 100% docenisz drzemiący w nim potencjał! Dzięki tej książce dowiesz się, jak wyłuskać najistotniejsze informacje z morza danych. W trakcie lektury nauczysz się błyskawicznie przygotowywać raporty oraz prezentacje. Przekonasz się, że tabele przestawne wcale nie muszą być takie straszne, oraz zobaczysz najlepsze techniki prezentacji tendencji czy oceny efektywności w realizacji celów. Kolejne wydanie książki zostało zaktualizowane, ulepszone i rozszerzone o mnóstwo nowych, przydatnych wiadomości. Dowiesz się, jak importować dane z bazy SQL Server oraz jak wykorzystać możliwości dodatku Power View. Książka ta jest idealną pozycją dla tonących w gąszczu danych! Dzięki tej książce: poznasz narzędzia Excela w zakresie analizy i prezentacji danych opanujesz najlepsze techniki projektowania tabel przygotujesz czytelne raporty wykorzystasz w pełni możliwości Excela Uratuj się z morza danych!
Analiza marketingowa. Praktyczne techniki z wykorzystaniem analizy danych i narzędzi Excela
Wayne L. Winston
Specjaliści w dziedzinie marketingu coraz częściej sięgają po wyrafinowane metody analizy. Obecnie firmy są zalewane ogromną ilością danych - skorzystanie z płynącej z nich wiedzy jest znakomitą szansą na poprawę kondycji przedsiębiorstwa. W tym celu trzeba dane zebrać, przetworzyć i poddać analizie. Potrzebne więc są narzędzia, najlepiej proste w użytkowaniu i powszechnie znane. Takim właśnie narzędziem jest arkusz kalkulacyjny MS Excel - potężna i wszechstronna aplikacja, dzięki której nawet bez specjalistycznej wiedzy można wykonać profesjonalną analizę marketingową i zdobyć mnóstwo przydatnych informacji. Ta książka powstała na bazie autorskiego kursu analizy marketingowej dla słuchaczy studiów MBA. Pokazuje, jak wykorzystywać Excela do modelowania danych i pozyskiwania wiedzy niezbędnej do kreowania skutecznego marketingu w firmie. Niemal wszystkie pojęcia wyjaśniono na przykładach, a sposób wykonania ćwiczeń pokazano krok po kroku. Do książki dołączono pliki z danymi i rozwiązaniami zadań. Dowiesz się, jak przetwarzać dane za pomocą wykresów, wyznaczać krzywe popytu, prowadzić analizę skupień w segmentach rynku oraz tworzyć indywidualne modele danych i prognozować wpływ akcji marketingowych na wzrost sprzedaży. Oznacza to, że aby zdobyć umiejętności analizy marketingowej, potrzebujesz tylko tego podręcznika i Excela! W tej książce między innymi: analiza danych marketingowych opracowywanie strategii najbardziej zyskownych wycen wykorzystywanie narzędzi prognostycznych analiza łączona i analiza wyborów dyskretnych pomiar skuteczności wydatków na reklamę analiza danych z mediów społecznościowych Wyrafinowane analizy biznesowe? Potrzebujesz tylko Excela!
Enrique López Manas, Diego Grancini
Performant applications are one of the key drivers of success in the mobile world. Users may abandon an app if it runs slowly. Learning how to build applications that balance speed and performance with functionality and UX can be a challenge; however, it's now more important than ever to get that balance right.Android High Performance will start you thinking about how to wring the most from any hardware your app is installed on, so you can increase your reach and engagement. The book begins by providing an introduction to state–of-the-art Android techniques and the importance of performance in an Android application. Then, we will explain the Android SDK tools regularly used to debug and profile Android applications. We will also learn about some advanced topics such as building layouts, multithreading, networking, and security. Battery life is one of the biggest bottlenecks in applications; and this book will show typical examples of code that exhausts battery life, how to prevent this, and how to measure battery consumption from an application in every kind of situation to ensure your apps don’t drain more than they should.This book explains techniques for building optimized and efficient systems that do not drain the battery, cause memory leaks, or slow down with time.
Apache Hadoop 3 Quick Start Guide. Learn about big data processing and analytics
Hrishikesh Vijay Karambelkar
Apache Hadoop is a widely used distributed data platform. It enables large datasets to be efficiently processed instead of using one large computer to store and process the data. This book will get you started with the Hadoop ecosystem, and introduce you to the main technical topics, including MapReduce, YARN, and HDFS.The book begins with an overview of big data and Apache Hadoop. Then, you will set up a pseudo Hadoop development environment and a multi-node enterprise Hadoop cluster. You will see how the parallel programming paradigm, such as MapReduce, can solve many complex data processing problems.The book also covers the important aspects of the big data software development lifecycle, including quality assurance and control, performance, administration, and monitoring. You will then learn about the Hadoop ecosystem, and tools such as Kafka, Sqoop, Flume, Pig, Hive, and HBase. Finally, you will look at advanced topics, including real time streaming using Apache Storm, and data analytics using Apache Spark. By the end of the book, you will be well versed with different configurations of the Hadoop 3 cluster.
Dayong Du
If you are a data analyst, developer, or simply someone who wants to use Hive to explore and analyze data in Hadoop, this is the book for you. Whether you are new to big data or an expert, with this book, you will be able to master both the basic and the advanced features of Hive. Since Hive is an SQL-like language, some previous experience with the SQL language and databases is useful to have a better understanding of this book.
Apache Ignite Quick Start Guide. Distributed data caching and processing made easy
Sujoy Acharya
Apache Ignite is a distributed in-memory platform designed to scale and process large volume of data. It can be integrated with microservices as well as monolithic systems, and can be used as a scalable, highly available and performant deployment platform for microservices. This book will teach you to use Apache Ignite for building a high-performance, scalable, highly available system architecture with data integrity.The book takes you through the basics of Apache Ignite and in-memory technologies. You will learn about installation and clustering Ignite nodes, caching topologies, and various caching strategies, such as cache aside, read and write through, and write behind. Next, you will delve into detailed aspects of Ignite’s data grid: web session clustering and querying data.You will learn how to process large volumes of data using compute grid and Ignite’s map-reduce and executor service. You will learn about the memory architecture of Apache Ignite and monitoring memory and caches. You will use Ignite for complex event processing, event streaming, and the time-series predictions of opportunities and threats. Additionally, you will go through off-heap and on-heap caching, swapping, and native and Spring framework integration with Apache Ignite.By the end of this book, you will be confident with all the features of Apache Ignite 2.x that can be used to build a high-performance system architecture.
Raúl Estrada
Apache Kafka provides a unified, high-throughput, low-latency platform to handle real-time data feeds. This book will show you how to use Kafka efficiently, and contains practical solutions to the common problems that developers and administrators usually face while working with it. This practical guide contains easy-to-follow recipes to help you set up, configure, and use Apache Kafka in the best possible manner. You will use Apache Kafka Consumers and Producers to build effective real-time streaming applications. The book covers the recently released Kafka version 1.0, the Confluent Platform and Kafka Streams. The programming aspect covered in the book will teach you how to perform important tasks such as message validation, enrichment and composition.Recipes focusing on optimizing the performance of your Kafka cluster, and integrate Kafka with a variety of third-party tools such as Apache Hadoop, Apache Spark, and Elasticsearch will help ease your day to day collaboration with Kafka greatly. Finally, we cover tasks related to monitoring and securing your Apache Kafka cluster using tools such as Ganglia and Graphite.If you're looking to become the go-to person in your organization when it comes to working with Apache Kafka, this book is the only resource you need to have.
Raúl Estrada
Apache Kafka is a great open source platform for handling your real-time data pipeline to ensure high-speed filtering and pattern matching on the ?y. In this book, you will learn how to use Apache Kafka for efficient processing of distributed applications and will get familiar with solving everyday problems in fast data and processing pipelines.This book focuses on programming rather than the configuration management of Kafka clusters or DevOps. It starts off with the installation and setting up the development environment, before quickly moving on to performing fundamental messaging operations such as validation and enrichment.Here you will learn about message composition with pure Kafka API and Kafka Streams. You will look into the transformation of messages in different formats, such asext, binary, XML, JSON, and AVRO. Next, you will learn how to expose the schemas contained in Kafka with the Schema Registry. You will then learn how to work with all relevant connectors with Kafka Connect. While working with Kafka Streams, you will perform various interesting operations on streams, such as windowing, joins, and aggregations. Finally, through KSQL, you will learn how to retrieve, insert, modify, and delete data streams, and how to manipulate watermarks and windows.
Apache Mahout Clustering Designs. Explore clustering algorithms used with Apache Mahout
Dragan Milcevski, Ashish Gupta, Ashish Gupta
Jagat Singh
As more and more organizations are discovering the use of big data analytics, interest in platforms that provide storage, computation, and analytic capabilities is booming exponentially. This calls for data management. Hadoop caters to this need. Oozie fulfils this necessity for a scheduler for a Hadoop job by acting as a cron to better analyze data. Apache Oozie Essentials starts off with the basics right from installing and configuring Oozie from source code on your Hadoop cluster to managing your complex clusters. You will learn how to create data ingestion and machine learning workflows.This book is sprinkled with the examples and exercises to help you take your big data learning to the next level. You will discover how to write workflows to run your MapReduce, Pig ,Hive, and Sqoop scripts and schedule them to run at a specific time or for a specific business requirement using a coordinator. This book has engaging real-life exercises and examples to get you in the thick of things. Lastly, you’ll get a grip of how to embed Spark jobs, which can be used to run your machine learning models on Hadoop.By the end of the book, you will have a good knowledge of Apache Oozie. You will be capable of using Oozie to handle large Hadoop workflows and even improve the availability of your Hadoop environment.
Romeo Kienzler, Md. Rezaul Karim, Sridhar Alla,...
Apache Spark is an in-memory, cluster-based data processing system that provides a wide range of functionalities such as big data processing, analytics, machine learning, and more. With this Learning Path, you can take your knowledge of Apache Spark to the next level by learning how to expand Spark's functionality and building your own data flow and machine learning programs on this platform.You will work with the different modules in Apache Spark, such as interactive querying with Spark SQL, using DataFrames and datasets, implementing streaming analytics with Spark Streaming, and applying machine learning and deep learning techniques on Spark using MLlib and various external tools.By the end of this elaborately designed Learning Path, you will have all the knowledge you need to master Apache Spark, and build your own big data processing and analytics pipeline quickly and without any hassle.This Learning Path includes content from the following Packt products:• Mastering Apache Spark 2.x by Romeo Kienzler• Scala and Spark for Big Data Analytics by Md. Rezaul Karim, Sridhar Alla• Apache Spark 2.x Machine Learning Cookbook by Siamak Amirghodsi, Meenakshi Rajendran, Broderick Hall, Shuen MeiCookbook
Rishi Yadav
While Apache Spark 1.x gained a lot of traction and adoption in the early years, Spark 2.x delivers notable improvements in the areas of API, schema awareness, Performance, Structured Streaming, and simplifying building blocks to build better, faster, smarter, and more accessible big data applications. This book uncovers all these features in the form of structured recipes to analyze and mature large and complex sets of data.Starting with installing and configuring Apache Spark with various cluster managers, you will learn to set up development environments. Further on, you will be introduced to working with RDDs, DataFrames and Datasets to operate on schema aware data, and real-time streaming with various sources such as Twitter Stream and Apache Kafka. You will also work through recipes on machine learning, including supervised learning, unsupervised learning & recommendation engines in Spark.Last but not least, the final few chapters delve deeper into the concepts of graph processing using GraphX, securing your implementations, cluster optimization, and troubleshooting.
Apache Spark 2.x for Java Developers. Explore big data at scale using Apache Spark 2.x Java APIs
Sumit Kumar, Sourav Gulati
Apache Spark is the buzzword in the big data industry right now, especially with the increasing need for real-time streaming and data processing. While Spark is built on Scala, the Spark Java API exposes all the Spark features available in the Scala version for Java developers. This book will show you how you can implement various functionalities of the Apache Spark framework in Java, without stepping out of your comfort zone.The book starts with an introduction to the Apache Spark 2.x ecosystem, followed by explaining how to install and configure Spark, and refreshes the Java concepts that will be useful to you when consuming Apache Spark's APIs. You will explore RDD and its associated common Action and Transformation Java APIs, set up a production-like clustered environment, and work with Spark SQL. Moving on, you will perform near-real-time processing with Spark streaming, Machine Learning analytics with Spark MLlib, and graph processing with GraphX, all using various Java packages.By the end of the book, you will have a solid foundation in implementing components in the Spark framework in Java to build fast, real-time applications.
Ahmed Sherif, Amrith Ravindra, Michal Malohlava, Adnan...
Organizations these days need to integrate popular big data tools such as Apache Spark with highly efficient deep learning libraries if they’re looking to gain faster and more powerful insights from their data. With this book, you’ll discover over 80 recipes to help you train fast, enterprise-grade, deep learning models on Apache Spark.Each recipe addresses a specific problem, and offers a proven, best-practice solution to difficulties encountered while implementing various deep learning algorithms in a distributed environment. The book follows a systematic approach, featuring a balance of theory and tips with best practice solutions to assist you with training different types of neural networks such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs). You’ll also have access to code written in TensorFlow and Keras that you can run on Spark to solve a variety of deep learning problems in computer vision and natural language processing (NLP), or tweak to tackle other problems encountered in deep learning.By the end of this book, you'll have the skills you need to train and deploy state-of-the-art deep learning models on Apache Spark.
Apache Spark for Data Science Cookbook. Solve real-world analytical problems
Padma Priya Chitturi
Spark has emerged as the most promising big data analytics engine for data science professionals. The true power and value of Apache Spark lies in its ability to execute data science tasks with speed and accuracy. Spark’s selling point is that it combines ETL, batch analytics, real-time stream analysis, machine learning, graph processing, and visualizations. It lets you tackle the complexities that come with raw unstructured data sets with ease. This guide will get you comfortable and confident performing data science tasks with Spark. You will learn about implementations including distributed deep learning, numerical computing, and scalable machine learning. You will be shown effective solutions to problematic concepts in data science using Spark’s data science libraries such as MLLib, Pandas, NumPy, SciPy, and more. These simple and efficient recipes will show you how to implement algorithms and optimize your work.
Rindra Ramamonjison
Apache Spark is the next standard of open-source cluster-computing engine for processing big data. Many practical computing problems concern large graphs, like the Web graph and various social networks. The scale of these graphs - in some cases billions of vertices, trillions of edges - poses challenges to their efficient processing. Apache Spark GraphX API combines the advantages of both data-parallel and graph-parallel systems by efficiently expressing graph computation within the Spark data-parallel framework.This book will teach the user to do graphical programming in Apache Spark, apart from an explanation of the entire process of graphical data analysis. You will journey through the creation of graphs, its uses, its exploration and analysis and finally will also cover the conversion of graph elements into graph structures.This book begins with an introduction of the Spark system, its libraries and the Scala Build Tool. Using a hands-on approach, this book will quickly teach you how to install and leverage Spark interactively on the command line and in a standalone Scala program. Then, it presents all the methods for building Spark graphs using illustrative network datasets. Next, it will walk you through the process of exploring, visualizing and analyzing different network characteristics. This book will also teach you how to transform raw datasets into a usable form. In addition, you will learn powerful operations that can be used to transform graph elements and graph structures. Furthermore, this book also teaches how to create custom graph operations that are tailored for specific needs with efficiency in mind. The later chapters of this book cover more advanced topics such as clustering graphs, implementing graph-parallel iterative algorithms and learning methods from graph data.
Alex Liu
There's a reason why Apache Spark has become one of the most popular tools in Machine Learning – its ability to handle huge datasets at an impressive speed means you can be much more responsive to the data at your disposal. This book shows you Spark at its very best, demonstrating how to connect it with R and unlock maximum value not only from the tool but also from your data.Packed with a range of project blueprints that demonstrate some of the most interesting challenges that Spark can help you tackle, you'll find out how to use Spark notebooks and access, clean, and join different datasets before putting your knowledge into practice with some real-world projects, in which you will see how Spark Machine Learning can help you with everything from fraud detection to analyzing customer attrition. You'll also find out how to build a recommendation engine using Spark's parallel computing powers.
Shrey Mehrotra, Akash Grade
Apache Spark is a ?exible framework that allows processing of batch and real-time data. Its unified engine has made it quite popular for big data use cases. This book will help you to get started with Apache Spark 2.0 and write big data applications for a variety of use cases.It will also introduce you to Apache Spark – one of the most popular Big Data processing frameworks. Although this book is intended to help you get started with Apache Spark, but it also focuses on explaining the core concepts. This practical guide provides a quick start to the Spark 2.0 architecture and its components. It teaches you how to set up Spark on your local machine. As we move ahead, you will be introduced to resilient distributed datasets (RDDs) and DataFrame APIs, and their corresponding transformations and actions. Then, we move on to the life cycle of a Spark application and learn about the techniques used to debug slow-running applications. You will also go through Spark’s built-in modules for SQL, streaming, machine learning, and graph analysis.Finally, the book will lay out the best practices and optimization techniques that are key for writing efficient Spark applications. By the end of this book, you will have a sound fundamental understanding of the Apache Spark framework and you will be able to write and optimize Spark applications.
Shashank Shekhar
Apache Superset is a modern, open source, enterprise-ready business intelligence (BI) web application. With the help of this book, you will see how Superset integrates with popular databases like Postgres, Google BigQuery, Snowflake, and MySQL. You will learn to create real time data visualizations and dashboards on modern web browsers for your organization using Superset.First, we look at the fundamentals of Superset, and then get it up and running. You'll go through the requisite installation, configuration, and deployment. Then, we will discuss different columnar data types, analytics, and the visualizations available. You'll also see the security tools available to the administrator to keep your data safe.You will learn how to visualize relationships as graphs instead of coordinates on plain orthogonal axes. This will help you when you upload your own entity relationship dataset and analyze the dataset in new, different ways. You will also see how to analyze geographical regions by working with location data.Finally, we cover a set of tutorials on dashboard designs frequently used by analysts, business intelligence professionals, and developers.
Alex Galea
Getting started with data science doesn't have to be an uphill battle. Applied Data Science with Python and Jupyter is a step-by-step guide ideal for beginners who know a little Python and are looking for a quick, fast-paced introduction to these concepts. In this book, you'll learn every aspect of the standard data workflow process, including collecting, cleaning, investigating, visualizing, and modeling data. You'll start with the basics of Jupyter, which will be the backbone of the book. After familiarizing ourselves with its standard features, you'll look at an example of it in practice with our first analysis. In the next lesson, you dive right into predictive analytics, where multiple classification algorithms are implemented. Finally, the book ends by looking at data collection techniques. You'll see how web data can be acquired with scraping techniques and via APIs, and then briefly explore interactive visualizations.
Dr. Tania Moulik
Applied Data Visualization with R and ggplot2 introduces you to the world of data visualization by taking you through the basic features of ggplot2. To start with, you’ll learn how to set up the R environment, followed by getting insights into the grammar of graphics and geometric objects before you explore the plotting techniques.You’ll discover what layers, scales, coordinates, and themes are, and study how you can use them to transform your data into aesthetical graphs. Once you’ve grasped the basics, you’ll move on to studying simple plots such as histograms and advanced plots such as superimposing and density plots. You’ll also get to grips with plotting trends, correlations, and statistical summaries.By the end of this book, you’ll have created data visualizations that will impress your clients.
Balu Nair, Sumit Ranjan, Dr. S. Senthamilarasu
Thanks to a number of recent breakthroughs, self-driving car technology is now an emerging subject in the field of artificial intelligence and has shifted data scientists' focus to building autonomous cars that will transform the automotive industry. This book is a comprehensive guide to use deep learning and computer vision techniques to develop autonomous cars. Starting with the basics of self-driving cars (SDCs), this book will take you through the deep neural network techniques required to get up and running with building your autonomous vehicle. Once you are comfortable with the basics, you'll delve into advanced computer vision techniques and learn how to use deep learning methods to perform a variety of computer vision tasks such as finding lane lines, improving image classification, and so on. You will explore the basic structure and working of a semantic segmentation model and get to grips with detecting cars using semantic segmentation. The book also covers advanced applications such as behavior-cloning and vehicle detection using OpenCV, transfer learning, and deep learning methodologies to train SDCs to mimic human driving.By the end of this book, you'll have learned how to implement a variety of neural networks to develop your own autonomous vehicle using modern Python libraries.
Alok Malik, Bradford Tuckfield
Starting with the basics, Applied Unsupervised Learning with R explains clustering methods, distribution analysis, data encoders, and features of R that enable you to understand your data better and get answers to your most pressing business questions. This book begins with the most important and commonly used method for unsupervised learning - clustering - and explains the three main clustering algorithms - k-means, divisive, and agglomerative. Following this, you'll study market basket analysis, kernel density estimation, principal component analysis, and anomaly detection. You'll be introduced to these methods using code written in R, with further instructions on how to work with, edit, and improve R code. To help you gain a practical understanding, the book also features useful tips on applying these methods to real business problems, including market segmentation and fraud detection. By working through interesting activities, you'll explore data encoders and latent variable models. By the end of this book, you will have a better understanding of different anomaly detection methods, such as outlier detection, Mahalanobis distances, and contextual and collective anomaly detection.
arc42 by Example. Software architecture documentation in practice
Gernot Starke, Michael Simons, Stefan Zörner, Ralf...
When developers document the architecture of their systems, they often invent their own specific ways of articulating structures, designs, concepts, and decisions. What they need is a template that enables simple and efficient software architecture documentation. arc42 by Example shows how it's done through several real-world examples.Each example in the book, whether it is a chess engine, a huge CRM system, or a cool web system, starts with a brief description of the problem domain and the quality requirements. Then, you'll discover the system context with all the external interfaces. You'll dive into an overview of the solution strategy to implement the building blocks and runtime scenarios. The later chapters also explain various cross-cutting concerns and how they affect other aspects of a program.