Analiza danych
Learning Google BigQuery. A beginner's guide to mining massive datasets through interactive analysis
Thirukkumaran Haridass, Eric Brown
Google BigQuery is a popular cloud data warehouse for large-scale data analytics. This book will serve as a comprehensive guide to mastering BigQuery, and how you can utilize it to quickly and efficiently get useful insights from your Big Data.You will begin with getting a quick overview of the Google Cloud Platform and the various services it supports. Then, you will be introduced to the Google BigQuery API and how it fits within in the framework of GCP. The book covers useful techniques to migrate your existing data from your enterprise to Google BigQuery, as well as readying and optimizing it for analysis. You will perform basic as well as advanced data querying using BigQuery, and connect the results to various third party tools for reporting and visualization purposes such as R and Tableau. If you're looking to implement real-time reporting of your streaming data running in your enterprise, this book will also help you.This book also provides tips, best practices and mistakes to avoid while working with Google BigQuery and services that interact with it. By the time you're done with it, you will have set a solid foundation in working with BigQuery to solve even the trickiest of data problems.
Joe Kuan
The book is aimed at all levels of readers. Beginners can learn the basic configurations and step-by-step approaches in creating charts or Highcharts cloud. For intermediate and advanced readers, the book explores the APIs, events, server-side operations and plugins.
Dmitry Anoshin, Sergey Sheypak
Hunk is the big data analytics platform that lets you rapidly explore, analyse, and visualize data in Hadoop and NoSQL data stores. It provides a single, fluid user experience, designed to show you insights from your big data without the need for specialized skills, fixed schemas, or months of development. Hunk goes beyond typical data analysis methods and gives you the power to rapidly detect patterns and find anomalies across petabytes of raw data.This book focuses on exploring, analysing, and visualizing big data in Hadoop and NoSQL data stores with this powerful full-featured big data analytics platform.You will begin by learning the Hunk architecture and Hunk Virtual Index before moving on to how to easily analyze and visualize data using Splunk Search Language (SPL). Next you will meet Hunk Apps which can easy integrate with NoSQL data stores such as MongoDB or Sqqrl. You will also discover Hunk knowledge objects, build a semantic layer on top of Hadoop, and explore data using the friendly user-interface of Hunk Pivot. You will connect MongoDB and explore data in the data store. Finally, you will go through report acceleration techniques and analyze data in the AWS Cloud.
Rahul Malewar
Informatica PowerCenter is an industry-leading ETL tool, known for its accelerated data extraction, transformation, and data management strategies. This book will be your quick guide to exploring Informatica PowerCenter’s powerful features such as working on sources, targets, transformations, performance optimization, scheduling, deploying for processing, and managing your data at speed. First, you’ll learn how to install and configure tools. You will learn to implement various data warehouse and ETL concepts, and use PowerCenter 10.x components to build mappings, tasks, workflows, and so on. You will come across features such as transformations, SCD, XML processing, partitioning, constraint-based loading, Incremental aggregation, and many more. Moreover, you’ll also learn to deliver powerful visualizations for data profiling using the advanced monitoring dashboard functionality offered by the new version. Using data transformation technique, performance tuning, and the many new advanced features, this book will help you understand and process data for training or production purposes. The step-by-step approach and adoption of real-time scenarios will guide you through effectively accessing all core functionalities offered by Informatica PowerCenter version 10.x.
Learning Jupyter. Select, Share, Interact and Integrate with Jupyter Not
Toomey
Jupyter Notebook is a web-based environment that enables interactive computing in notebook documents. It allows you to create and share documents that contain live code, equations, visualizations, and explanatory text. The Jupyter Notebook system is extensively used in domains such as data cleaning and transformation, numerical simulation, statistical modeling, machine learning, and much more.This book starts with a detailed overview of the Jupyter Notebook system and its installation in different environments. Next we’ll help you will learn to integrate Jupyter system with different programming languages such as R, Python, JavaScript, and Julia and explore the various versions and packages that are compatible with the Notebook system. Moving ahead, you master interactive widgets, namespaces, and working with Jupyter in a multiuser mode.Towards the end, you will use Jupyter with a big data set and will apply all the functionalities learned throughout the book.
Bahaaldine Azarmi
Kibana is an open source data visualization platform that allows you to interact with your data through stunning, powerful graphics. Its simple, browser-based interface enables you to quickly create and share dynamic dashboards that display changes to Elasticsearch queries in real time.In this book, you’ll learn how to use the Elastic stack on top of a data architecture to visualize data in real time. All data architectures have different requirements and expectations when it comes to visualizing the data, whether it’s logging analytics, metrics, business analytics, graph analytics, or scaling them as per your business requirements. This book will help you master Elastic visualization tools and adapt them to the requirements of your project. You will start by learning how to use the basic visualization features of Kibana 5. Then you will be shown how to implement a pure metric analytics architecture and visualize it using Timelion, a very recent and trendy feature of the Elastic stack. You will learn how to correlate data using the brand-new Graph visualization and build relationships between documents. Finally, you will be familiarized with the setup of a Kibana development environment so that you can build a custom Kibana plugin.By the end of this book you will have all the information needed to take your Elastic stack skills to a new level of data visualization.
Jerome Baton, Rik Van Bruggen
Neo4j is a graph database that allows traversing huge amounts of data with ease. This book aims at quickly getting you started with the popular graph database Neo4j.Starting with a brief introduction to graph theory, this book will show you the advantages of using graph databases along with data modeling techniques for graph databases. You'll gain practical hands-on experience with commonly used and lesser known features for updating graph store with Neo4j's Cypher query language. Furthermore, you'll also learn to create awesome procedures using APOC and extend Neo4j's functionality, enabling integration, algorithmic analysis, and other advanced spatial operation capabilities on data.Through the course of the book you will come across implementation examples on the latest updates in Neo4j, such as in-graph indexes, scaling, performance improvements, visualization, data refactoring techniques, security enhancements, and much more. By the end of the book, you'll have gained the skills to design and implement modern spatial applications, from graphing data to unraveling business capabilities with the help of real-world use cases.
Learning pandas. High performance data manipulation and analysis using Python - Second Edition
Nicola Rainiero, Sonali Dayal, Michael Heydt
You will learn how to use pandas to perform data analysis in Python. You will start with an overview of data analysis and iteratively progress from modeling data, to accessing data from remote sources, performing numeric and statistical analysis, through indexing and performing aggregate analysis, and finally to visualizing statistical data and applying pandas to finance.With the knowledge you gain from this book, you will quickly learn pandas and how it can empower you in the exciting world of data manipulation, analysis and science.
Dan Keeley, Diethard Steiner, María Carina Roldán,...
Pentaho Data Integration(PDI) is an intuitive and graphical environment packed with drag-and-drop design and powerful Extract-Tranform-Load (ETL) capabilities. This book shows and explains the new interactive features of Spoon, the revamped look and feel, and the newest features of the tool including transformations and jobs Executors and the invaluable Metadata Injection capability.We begin with the installation of PDI software and then move on to cover all the key PDI concepts. Each of the chapter introduces new features, enabling you to gradually get practicing with the tool. First, you will learn to do all kind of data manipulation and work with simple plain files. Then, the book teaches you how you can work with relational databases inside PDI. Moreover, you will be given a primer on data warehouse concepts and you will learn how to load data in a data warehouse. During the course of this book, you will be familiarized with its intuitive, graphical and drag-and-drop design environment.By the end of this book, you will learn everything you need to know in order to meet your data manipulation requirements. Besides, your will be given best practices and advises for designing and deploying your projects.
Learning Predictive Analytics with Python. Click here to enter text
Ashish Kumar
Social Media and the Internet of Things have resulted in an avalanche of data. Data is powerful but not in its raw form - It needs to be processed and modeled, and Python is one of the most robust tools out there to do so. It has an array of packages for predictive modeling and a suite of IDEs to choose from. Learning to predict who would win, lose, buy, lie, or die with Python is an indispensable skill set to have in this data age. This book is your guide to getting started with Predictive Analytics using Python. You will see how to process data and make predictive models from it. We balance both statistical and mathematical concepts, and implement them in Python using libraries such as pandas, scikit-learn, and numpy. You’ll start by getting an understanding of the basics of predictive modeling, then you will see how to cleanse your data of impurities and get it ready it for predictive modeling. You will also learn more about the best predictive modeling algorithms such as Linear Regression, Decision Trees, and Logistic Regression. Finally, you will see the best practices in predictive modeling, as well as the different applications of predictive modeling in the modern world.
David Bellot, Dan Toomey
Probabilistic graphical models (PGM, also known as graphical models) are a marriage between probability theory and graph theory. Generally, PGMs use a graph-based representation. Two branches of graphical representations of distributions are commonly used, namely Bayesian networks and Markov networks. R has many packages to implement graphical models.We’ll start by showing you how to transform a classical statistical model into a modern PGM and then look at how to do exact inference in graphical models. Proceeding, we’ll introduce you to many modern R packages that will help you to perform inference on the models. We will then run a Bayesian linear regression and you’ll see the advantage of going probabilistic when you want to do prediction. Next, you’ll master using R packages and implementing its techniques. Finally, you’ll be presented with machine learning applications that have a direct impact in many fields. Here, we’ll cover clustering and the discovery of hidden information in big data, as well as two important methods, PCA and ICA, to reduce the size of big problems.
Tomasz Drabas, Denny Lee
Apache Spark is an open source framework for efficient cluster computing with a strong interface for data parallelism and fault tolerance. This book will show you how to leverage the power of Python and put it to use in the Spark ecosystem. You will start by getting a firm understanding of the Spark 2.0 architecture and how to set up a Python environment for Spark. You will get familiar with the modules available in PySpark. You will learn how to abstract data with RDDs and DataFrames and understand the streaming capabilities of PySpark. Also, you will get a thorough overview of machine learning capabilities of PySpark using ML and MLlib, graph processing using GraphFrames, and polyglot persistence using Blaze. Finally, you will learn how to deploy your applications to the cloud using the spark-submit command. By the end of this book, you will have established a firm understanding of the Spark Python API and how it can be used to build data-intensive applications.
Anita Graser
QGIS is a user-friendly open source geographic information system (GIS) that runs on Linux, Unix, Mac OS X, and Windows. The popularity of open source geographic information systems and QGIS in particular has been growing rapidly over the last few years.Learning QGIS Third Edition is a practical, hands-on guide updated for QGIS 2.14 that provides you with clear, step-by-step exercises to help you apply your GIS knowledge to QGIS. Through clear, practical exercises, this book will introduce you to working with QGIS quickly and painlessly.This book takes you from installing and configuring QGIS to handling spatial data to creating great maps. You will learn how to load and visualize existing spatialdata and create data from scratch. You will get to know important plugins, perform common geoprocessing and spatial analysis tasks and automate them with Processing.We will cover how to achieve great cartographic output and print maps. Finally, you will learn how to extend QGIS using Python and even create your own plugin.
Dr. Param Jeet, PRASHANT VATS
The role of a quantitative analyst is very challenging, yet lucrative, so there is a lot of competition for the role in top-tier organizations and investment banks. This book is your go-to resource if you want to equip yourself with the skills required to tackle any real-world problem in quantitative finance using the popular R programming language.You'll start by getting an understanding of the basics of R and its relevance in the field of quantitative finance. Once you've built this foundation, we'll dive into the practicalities of building financialmodels in R. This will help you have a fair understanding of the topics as well as their implementation, as the authors have presented some use cases along with examples that are easy to understand and correlate.We'll also look at risk management and optimization techniques for algorithmic trading. Finally, the book will explain some advanced concepts, such as trading using machine learning, optimizations, exotic options, and hedging.By the end of this book, you will have a firm grasp of the techniques required to implement basic quantitative finance models in R.
Learning R Programming. Language, tools, and practical techniques
Kun Ren
R is a high-level functional language and one of the must-know tools for data science and statistics. Powerful but complex, R can be challenging for beginners and those unfamiliar with its unique behaviors. Learning R Programming is the solution - an easy and practical way to learn R and develop a broad and consistent understanding of the language. Through hands-on examples you'll discover powerful R tools, and R best practices that will give you a deeper understanding of working with data. You'll get to grips with R's data structures and data processing techniques, as well as the most popular R packages to boost your productivity from the offset.Start with the basics of R, then dive deep into the programming techniques and paradigms to make your R code excel. Advance quickly to a deeper understanding of R's behavior as you learn common tasks including data analysis, databases, web scraping, high performance computing, and writing documents. By the end of the book, you'll be a confident R programmer adept at solving problems with the right techniques.
Christoph Körner
Using D3.js and Responsive Design principles, you will not just be able to implement visualizations that look and feel awesome across all devices and screen resolutions, but you will also boost your productivity and reduce development time by making use of Bootstrap—the most popular framework for developing responsive web applications.This book teaches the basics of scalable vector graphics (SVG), D3.js, and Bootstrap while focusing on Responsive Design as well as mobile-first visualizations; the reader will start by discovering Bootstrap and how it can be used for creating responsive applications, and then implement a basic bar chart in D3.js. You will learn about loading, parsing, and filtering data in JavaScript and then dive into creating a responsive visualization by using Media Queries, responsive interactions for Mobile and Desktop devices, and transitions to bring the visualization to life. In the following chapters, we build a fully responsive interactive map to display geographic data using GeoJSON and set up integration testing with Protractor to test the application across real devices using a mobile API gateway such as AWS Device Farm. You will finish the journey by discovering the caveats of mobile-first applications and learn how to master cross-browser complications.
David Lai, Riaz Ahmed
The book starts with the basics of SAP Analytics Cloud (formerly known as SAP BusinessObjects Cloud) and exposes almost every significant feature a beginner needs to master. Packed with illustrations and short, essential, to-the-point descriptions, the book provides a unique learning experience. Your journey of exploration starts with a basic introduction to the SAP Analytics Cloud platform. You will then learn about different segments of the product, such as Models, Stories, Digital Boardroom, and so on. Then, you are introduced to the product's interface: the Home screen, the main menu, and more. Then comes the hands-on aspect of the book, which starts with model creation. Next, you learn how to utilize a model to prepare different types of stories(reports) with the help of charts, tables, Geo Maps, and more. In the final chapters of this book, you will learn about Digital Boardroom, Collaboration, and Administration.
Dipanjan Sarkar, Karthik Ganapathy, Raghav Bali, Tushar...
The Internet has truly become humongous, especially with the rise of various forms of social media in the last decade, which give users a platform to express themselves and also communicate and collaborate with each other. This book will help the reader to understand the current social media landscape and to learn how analytics can be leveraged to derive insights from it. This data can be analyzed to gain valuable insights into the behavior and engagement of users, organizations, businesses, and brands. It will help readers frame business problems and solve them using social data.The book will also cover several practical real-world use cases on social media using R and its advanced packages to utilize data science methodologies such as sentiment analysis, topic modeling, text summarization, recommendation systems, social network analysis, classification, and clustering. This will enable readers to learn different hands-on approaches to obtain data from diverse social media sources such as Twitter and Facebook. It will also show readers how to establish detailed workflows to process, visualize, and analyze data to transform social data into actionable insights.
Learning Spark SQL. Architect streaming analytics and machine learning solutions
Aurobindo Sarkar
In the past year, Apache Spark has been increasingly adopted for the development of distributed applications. Spark SQL APIs provide an optimized interface that helps developers build such applications quickly and easily. However, designing web-scale production applications using Spark SQL APIs can be a complex task. Hence, understanding the design and implementation best practices before you start your project will help you avoid these problems.This book gives an insight into the engineering practices used to design and build real-world, Spark-based applications. The book's hands-on examples will give you the required confidence to work on any future projects you encounter in Spark SQL.It starts by familiarizing you with data exploration and data munging tasks using Spark SQL and Scala. Extensive code examples will help you understand the methods used to implement typical use-cases for various types of applications. You will get a walkthrough of the key concepts and terms that are common to streaming, machine learning, and graph applications. You will also learn key performance-tuning details including Cost Based Optimization (Spark 2.2) in Spark SQL applications. Finally, you will move on to learning how such systems are architected and deployed for a successful delivery of your project.
Learning Splunk Web Framework. Create, extend and publish real time Splunk applications
Vincent Sesto
Building rich applications on the Web using Splunk is now simpler than ever before with the Splunk Web Framework. It empowers developers to build their own web applications with custom dashboards, tables, charts, form searches, and other functionalities in the datasets at their disposal. The book will start with the fundamentals of the Splunk Web Framework, teaching you the secrets of building interesting and user-friendly applications. In the first application, you will learn to analyze and monitor traffic hitting the NASA website and learn to create dashboards for it. You will then learn additional, and more detailed, techniques to enhance the functionalities of the app such as dashboards and forms, editing simple XML, using simple XML extensions, tokens, post-process searches, dynamic drill-downs, the Splunk Web Framework and REST API, and much more. The second app will use historical stock market data and will create custom dashboards using Splunk Web Framework; the book will now cover important topics such as creating HTML dashboards, enhancing the visual appeal of the app using CSS, and moving your app with SplunkJS.The book will provide different and interesting examples instead of the usual “Log, Index, Search, and Graph” so that Splunk will be the first tool readers think of to resolve a problem.
Joshua N. Milligan
Tableau is the gold standard of business intelligence and visual analytics tools in every industry. It enables rapid data visualization and interpretation with charts, graphs, dashboards, and much more. Updated with the latest features of Tableau, this book takes you from the foundations of the Tableau 2019 paradigm through to advanced topics.This third edition of the bestselling guide by Tableau Zen Master, Joshua Milligan, will help you come to grips with updated features, such as set actions and transparent views. Beginning with installation, you'll create your first visualizations with Tableau and then explore practical examples and advanced techniques. You'll create bar charts, tree maps, scatterplots, time series, and a variety of other visualizations. Next, you'll discover techniques to overcome challenges presented by data structure and quality and engage in effective data storytelling and decision making with business critical information. Finally, you'll be introduced to Tableau Prep, and learn how to use it to integrate and shape data for analysis.By the end of this book, you will be equipped to leverage the powerful features of Tableau 2019 for decision making.