Big data
Managing Data as a Product. Design and build data-product-centered socio-technical architectures
Andrea Gioia, Giulio Scotti
Traditional monolithic data platforms struggle with scalability and burden central data teams with excessive cognitive load, leading to challenges in managing technological debt. As maintenance costs escalate, these platforms lose their ability to provide sustained value over time. With two decades of hands-on experience implementing data solutions and his pioneering work in the Open Data Mesh Initiative, Andrea Gioia brings practical insights and proven strategies for transforming how organizations manage their data assets.Managing Data as a Product introduces a modular and distributed approach to data platform development, centered on the concept of data products. In this book, you’ll explore the rationale behind this shift, understand the core features and structure of data products, and learn how to identify, develop, and operate them in a production environment. The book guides you through designing and implementing an incremental, value-driven strategy for adopting data product-centered architectures, including strategies for securing buy-in from stakeholders. It also covers data modeling in distributed environments and its role in enabling modern generative AI.By the end of this book, you’ll understand product-centric data architecture and how to adopt it.*Email sign-up and proof of purchase required
Jane Sarah Lat
Data integrity management plays a critical role in the success and effectiveness of organizations trying to use financial and operational data to make business decisions. Unfortunately, there is a big gap between the analysis and management of finance data along with the proper implementation of complex data systems across various organizations.The first part of this book covers the important concepts for data quality and data integrity relevant to finance, data, and tech professionals. The second part then focuses on having you use several data tools and platforms to manage and resolve data integrity issues on financial data. The last part of this the book covers intermediate and advanced solutions, including managed cloud-based ledger databases, database locks, and artificial intelligence, to manage the integrity of financial data in systems and databases.After finishing this hands-on book, you will be able to solve various data integrity issues experienced by organizations globally.
Kirill Dubovikov
Data science and machine learning can transform any organization and unlock new opportunities. However, employing the right management strategies is crucial to guide the solution from prototype to production. Traditional approaches often fail as they don't entirely meet the conditions and requirements necessary for current data science projects. In this book, you'll explore the right approach to data science project management, along with useful tips and best practices to guide you along the way. After understanding the practical applications of data science and artificial intelligence, you'll see how to incorporate them into your solutions. Next, you will go through the data science project life cycle, explore the common pitfalls encountered at each step, and learn how to avoid them. Any data science project requires a skilled team, and this book will offer the right advice for hiring and growing a data science team for your organization. Later, you'll be shown how to efficiently manage and improve your data science projects through the use of DevOps and ModelOps. By the end of this book, you will be well versed with various data science solutions and have gained practical insights into tackling the different challenges that you'll encounter on a daily basis.
Peter Rising, Nate Chamberlain
Do you want to build and test your proficiency in the deployment, management, and monitoring of Microsoft Teams features within the Microsoft 365 platform? Managing Microsoft Teams: MS-700 Exam Guide will help you to effectively plan and implement Microsoft Teams using the Microsoft 365 Teams admin center and Windows PowerShell. You’ll also discover best practices for rolling out and managing MS services for Teams users within your Microsoft 365 tenant. The chapters are divided into three easy-to-follow parts: planning and design, feature policies and administration, and team management, while aligning with the official MS-700 exam objectives to help you prepare effectively for the exam.The book starts by taking you through planning and design, where you’ll learn how to plan migrations, make assessments for network readiness, and plan and implement governance tasks such as configuring guest access and monitoring usage. Later, you’ll understand feature administration, focusing on collaboration, meetings, live events, phone numbers, and the phone system, along with applicable policy configurations. Finally, the book shows you how to manage Teams and membership settings and create app policies.By the end of this book, you'll have learned everything you need to pass the MS-700 certification exam and have a handy reference guide for MS Teams.
Market Research and Analysis. Mastering Market Research: Advanced Methods, Design, and Data Analysis
Mercury Learning and Information, Marcus Goncalves
This book offers an in-depth exploration of market research and analysis, guiding readers through the entire process from defining research objectives to communicating results. Begin by understanding the purpose and ethics of market research, laying a strong groundwork for your studies. Progress to defining precise research objectives and exploring secondary research methods to gather existing information.Next, engage with primary research methods, focusing on both quantitative and qualitative approaches. Learn how to develop and distribute surveys, choose the right sampling techniques, and utilize tools for data mining and web scraping. Gain insights into focus groups and observation studies, understanding how these qualitative methods can provide depth to your research.Finally, master the art of data analysis and result communication. Explore descriptive statistics, hypothesis testing, and inferential statistics to make sense of your data. Learn to effectively present your findings to stakeholders, ensuring your research translates into actionable insights. By the end of the course, you will be well-equipped to conduct thorough market research and communicate your results effectively.
Kinga Sroka
Witaj w świecie fascynujących danych! Biznes nie istnieje bez twardych danych, założeń, KPI i ich realizacji. Także ta jego część, która jest związana z promocją. Szczególnie online. Dziś nie można być marketerem z prawdziwego zdarzenia i nie znać chociażby narzędzi oferowanych przez Google ― Analytics, Tag Manager, Search Console i Trends. Te nazwy kojarzą Ci się z czymś niezrozumiałym albo budzą obawy? Nie martw się i sięgnij po tę książkę! Dowiesz się z niej, jak efektywnie korzystać z internetowych rozwiązań analitycznych. Ten poradnik stanowi świetne wprowadzenie do marketingu i analityki biznesowej online dla osób, które dopiero zapoznają się z tym tematem. To nie tylko przegląd narzędzi współczesnego analityka. Autorka opisuje również kompetencje, które będą potrzebne osobom z branży w najbliższej przyszłości, wskazuje miejsca, gdzie już można je zdobywać, wreszcie podpowiada, jakie umiejętności trzeba mieć, by otrzymać wymarzoną pracę w firmach zajmujących się danymi cyfrowymi.
MrExcel's Holy Macro! Books, Miguel Escobar, Ken...
This book equips you with the essential skills to master Power Query in Excel and Power BI. Starting with the basics, you'll learn query management, data types, and error handling, establishing a solid foundation. You'll explore techniques to move queries between Excel and Power BI, ensuring seamless workflow integration. As the guide progresses, you'll delve into data import methods from flat files, Excel, web-based, and relational sources, while performing key transformations like appending, combining, and reshaping data.Advanced topics such as conditional logic, Power Query values, and M Language fundamentals will enhance your ability to customize and optimize queries. The book also covers the creation of parameters and custom functions, alongside applying sophisticated date and time techniques.Finally, you'll learn to optimize query performance and automate data refreshes, ensuring your analysis remains current. By the end of this guide, you'll have the confidence and expertise to effectively transform and manage data using Power Query, significantly enhancing your data analysis capabilities in Excel and Power BI.
Sandeep Nair, Chintan Mehta, Dharmesh Vasoya
Apache Solr is the only standalone enterprise search server with a REST-like application interface. providing highly scalable, distributed search and index replication for many of the world's largest internet sites.To begin with, you would be introduced to how you perform full text search, multiple filter search, perform dynamic clustering and so on helping you to brush up the basics of Apache Solr. You will also explore the new features and advanced options released in Apache Solr 7.x which will get you numerous performance aspects and making data investigation simpler, easier and powerful. You will learn to build complex queries, extensive filters and how are they compiled in your system to bring relevance in your search tools. You will learn to carry out Solr scoring, elements affecting the document score and how you can optimize or tune the score for the application at hand. You will learn to extract features of documents, writing complex queries in re-ranking the documents. You will also learn advanced options helping you to know what content is indexed and how the extracted content is indexed. Throughout the book, you would go through complex problems with solutions along with varied approaches to tackle your business needs. By the end of this book, you will gain advanced proficiency to build out-of-box smart search solutions for your enterprise demands.
Romeo Kienzler
Apache Spark is an in-memory, cluster-based Big Data processing system that provides a wide range of functionalities such as graph processing, machine learning, stream processing, and more. This book will take your knowledge of Apache Spark to the next level by teaching you how to expand Spark’s functionality and build your data flows and machine/deep learning programs on top of the platform.The book starts with a quick overview of the Apache Spark ecosystem, and introduces you to the new features and capabilities in Apache Spark 2.x. You will then work with the different modules in Apache Spark such as interactive querying with Spark SQL, using DataFrames and DataSets effectively, streaming analytics with Spark Streaming, and performing machine learning and deep learning on Spark using MLlib and external tools such as H20 and Deeplearning4j. The book also contains chapters on efficient graph processing, memory management and using Apache Spark on the cloud.By the end of this book, you will have all the necessary information to master Apache Spark, and use it efficiently for Big Data processing and analytics.
Mastering Apache Storm. Real-time big data streaming using Kafka, Hbase and Redis
Ankit Jain
Apache Storm is a real-time Big Data processing framework that processes large amounts of data reliably, guaranteeing that every message will be processed. Storm allows you to scale your data as it grows, making it an excellent platform to solve your big data problems. This extensive guide will help you understand right from the basics to the advanced topics of Storm.The book begins with a detailed introduction to real-time processing and where Storm fits in to solve these problems. You’ll get an understanding of deploying Storm on clusters by writing a basic Storm Hello World example. Next we’ll introduce you to Trident and you’ll get a clear understanding of how you can develop and deploy a trident topology. We cover topics such as monitoring, Storm Parallelism, scheduler and log processing, in a very easy to understand manner. You will also learn how to integrate Storm with other well-known Big Data technologies such as HBase, Redis, Kafka, and Hadoop to realize the full potential of Storm.With real-world examples and clear explanations, this book will ensure you will have a thorough mastery of Apache Storm. You will be able to use this knowledge to develop efficient, distributed real-time applications to cater to your business needs.
Mastering Arduino. A project-based approach to electronics, circuits, and programming
Jon Hoffman
Mastering Arduino is an all-in-one guide to getting the most out of your Arduino. This practical, no-nonsense guide teaches you all of the electronics and programming skills that you need to create advanced Arduino projects. This book is packed full of real-world projects for you to practice on, bringing all of the knowledge in the book together and giving you the skills to build your own robot from the examples in this book. The final two chapters discuss wireless technologies and how they can be used in your projects. The book begins with the basics of electronics, making sure that you understand components, circuits, and prototyping before moving on. It then performs the same function for code, getting you into the Arduino IDE and showing you how to connect the Arduino to a computer and run simple projects on your Arduino.Once the basics are out of the way, the next 10 chapters of the book focus on small projects centered around particular components, such as LCD displays, stepper motors, or voice synthesizers. Each of these chapters will get you familiar with the technology involved, how to build with it, how to program it, and how it can be used in your own projects.
Christoph Körner, Marcel Alsdorf
Azure Machine Learning is a cloud service for accelerating and managing the machine learning (ML) project life cycle that ML professionals, data scientists, and engineers can use in their day-to-day workflows. This book covers the end-to-end ML process using Microsoft Azure Machine Learning, including data preparation, performing and logging ML training runs, designing training and deployment pipelines, and managing these pipelines via MLOps.The first section shows you how to set up an Azure Machine Learning workspace; ingest and version datasets; as well as preprocess, label, and enrich these datasets for training. In the next two sections, you'll discover how to enrich and train ML models for embedding, classification, and regression. You'll explore advanced NLP techniques, traditional ML models such as boosted trees, modern deep neural networks, recommendation systems, reinforcement learning, and complex distributed ML training techniques - all using Azure Machine Learning.The last section will teach you how to deploy the trained models as a batch pipeline or real-time scoring service using Docker, Azure Machine Learning clusters, Azure Kubernetes Services, and alternative deployment targets.By the end of this book, you’ll be able to combine all the steps you’ve learned by building an MLOps pipeline.
Christoph Körner, Kaijisse Waaijer
The increase being seen in data volume today requires distributed systems, powerful algorithms, and scalable cloud infrastructure to compute insights and train and deploy machine learning (ML) models. This book will help you improve your knowledge of building ML models using Azure and end-to-end ML pipelines on the cloud.The book starts with an overview of an end-to-end ML project and a guide on how to choose the right Azure service for different ML tasks. It then focuses on Azure Machine Learning and takes you through the process of data experimentation, data preparation, and feature engineering using Azure Machine Learning and Python. You'll learn advanced feature extraction techniques using natural language processing (NLP), classical ML techniques, and the secrets of both a great recommendation engine and a performant computer vision model using deep learning methods. You'll also explore how to train, optimize, and tune models using Azure Automated Machine Learning and HyperDrive, and perform distributed training on Azure. Then, you'll learn different deployment and monitoring techniques using Azure Kubernetes Services with Azure Machine Learning, along with the basics of MLOps—DevOps for ML to automate your ML process as CI/CD pipeline.By the end of this book, you'll have mastered Azure Machine Learning and be able to confidently design, build and operate scalable ML pipelines in Azure.
Imran Bashir
Blockchain is a distributed database that enables permanent, transparent, and secure storage of data. The blockchain technology is the backbone of cryptocurrency – in fact, it’s the shared public ledger upon which the entire Bitcoin network relies – and it’s gaining popularity with people who work in finance, government, and the arts.Blockhchain technology uses cryptography to keep data secure. This book gives a detailed description of this leading technology and its implementation in the real world.This book begins with the technical foundations of blockchain, teaching you the fundamentals of cryptography and how it keeps data secure. You will learn about the mechanisms behind cryptocurrencies and how to develop applications using Ethereum, a decentralized virtual machine. You will explore different blockchain solutions and get an exclusive preview into Hyperledger, an upcoming blockchain solution from IBM and the Linux Foundation. You will also be shown how to implement blockchain beyond currencies, scability with blockchain, and the future scope of this fascinating and powerful technology.
Dmitry Anoshin, Himani Rana, Ning Ma, Neil...
Business intelligence is becoming more important by the day, with cloud offerings and mobile devices gaining wider acceptance and achieving better market penetration. MicroStrategy Reporting Suite is an absolute leader in the BI market and offers rich capabilities from basic data visualizations to predictive analytics. It lets you various delivery methods such as the Web, desktops, and mobiles.Using real-world BI scenarios, this book helps you to implement Business Analytics solutions in big e-commerce companies. It kicks off with MicroStrategy 10 features and then covers schema design models and techniques. Building upon your existing knowledge, the book will teach you advanced techniques for building documents and dashboards. It further teaches various graphical techniques for presenting data for analysis using maps, graphs, and advanced charts. Although MicroStrategy has rich functionality, the book will show how to customize it in order to meet your business requirements. You will also become familiar with the native analytical functions that will help you to maximize the impact of BI solutions with powerful predictive analytics. Furthermore, the book will focus on MicroStrategy Mobile Analytics along with data discovery and desktop capabilities such as connecting various data sources and building interactive dashboards. The book will also uncover best practices, troubleshooting techniques for MicroStrategy system administration, and also security and authentication techniques. Lastly, you will learn to use Hadoop for MicroStrategy reporting.By the end of the book, you will become proficient in evaluating any BI software in order to choose the best one that meets all business requirements.
Pablo Navarro Castillo, Pablo Navarro Castillo
Ravi Kumar Gupta, Yuvraj Gupta
Even structured data is useless if it can’t help you to take strategic decisions and improve existing system. If you love to play with data, or your job requires you to process custom log formats, design a scalable analysis system, and manage logs to do real-time data analysis, this book is your one-stop solution. By combining the massively popular Elasticsearch, Logstash, Beats, and Kibana, elastic.co has advanced the end-to-end stack that delivers actionable insights in real time from almost any type of structured or unstructured data source. If your job requires you to process custom log formats, design a scalable analysis system, explore a variety of data, and manage logs, this book is your one-stop solution. You will learn how to create real-time dashboards and how to manage the life cycle of logs in detail through real-life scenarios.This book brushes up your basic knowledge on implementing the Elastic Stack and then dives deeper into complex and advanced implementations of the Elastic Stack. We’ll help you to solve data analytics challenges using the Elastic Stack and provide practical steps on centralized logging and real-time analytics with the Elastic Stack in production. You will get to grip with advanced techniques for log analysis and visualization. Newly announced features such as Beats and X-Pack are also covered in detail with examples.Toward the end, you will see how to use the Elastic stack for real-world case studies and we’ll show you some best practices and troubleshooting techniques for the Elastic Stack.
Bharvi Dixit
Elasticsearch is a modern, fast, distributed, scalable, fault tolerant, and open source search and analytics engine. Elasticsearch leverages the capabilities of Apache Lucene, and provides a new level of control over how you can index and search even huge sets of data.This book will give you a brief recap of the basics and also introduce you to the new features of Elasticsearch 5. We will guide you through the intermediate and advanced functionalities of Elasticsearch, such as querying, indexing, searching, and modifying data. We’ll also explore advanced concepts, including aggregation, index control, sharding, replication, and clustering. We’ll show you the modules of monitoring and administration available in Elasticsearch, and will also cover backup and recovery. You will get an understanding of how you can scale your Elasticsearch cluster to contextualize it and improve its performance. We’ll also show you how you can create your own analysis plugin in Elasticsearch. By the end of the book, you will have all the knowledge necessary to master Elasticsearch and put it to efficient use.
Silas Toms, Paul Crickard, Eric van Rees
Python comes with a host of open source libraries and tools that help you work on professional geoprocessing tasks without investing in expensive tools. This book will introduce Python developers, both new and experienced, to a variety of new code libraries that have been developed to perform geospatial analysis, statistical analysis, and data management. This book will use examples and code snippets that will help explain how Python 3 differs from Python 2, and how these new code libraries can be used to solve age-old problems in geospatial analysis.You will begin by understanding what geoprocessing is and explore the tools and libraries that Python 3 offers. You will then learn to use Python code libraries to read and write geospatial data. You will then learn to perform geospatial queries within databases and learn PyQGIS to automate analysis within the QGIS mapping suite. Moving forward, you will explore the newly released ArcGIS API for Python and ArcGIS Online to perform geospatial analysis and create ArcGIS Online web maps. Further, you will deep dive into Python Geospatial web frameworks and learn to create a geospatial REST API.
Mastering Hadoop 3. Big data processing at scale to unlock unique business insights
Chanchal Singh, Manish Kumar
Apache Hadoop is one of the most popular big data solutions for distributed storage and for processing large chunks of data. With Hadoop 3, Apache promises to provide a high-performance, more fault-tolerant, and highly efficient big data processing platform, with a focus on improved scalability and increased efficiency.With this guide, you’ll understand advanced concepts of the Hadoop ecosystem tool. You’ll learn how Hadoop works internally, study advanced concepts of different ecosystem tools, discover solutions to real-world use cases, and understand how to secure your cluster. It will then walk you through HDFS, YARN, MapReduce, and Hadoop 3 concepts. You’ll be able to address common challenges like using Kafka efficiently, designing low latency, reliable message delivery Kafka systems, and handling high data volumes. As you advance, you’ll discover how to address major challenges when building an enterprise-grade messaging system, and how to use different stream processing systems along with Kafka to fulfil your enterprise goals.By the end of this book, you’ll have a complete understanding of how components in the Hadoop ecosystem are effectively integrated to implement a fast and reliable data pipeline, and you’ll be equipped to tackle a range of real-world problems in data pipelines.
Mastering Java for Data Science. Analytics and more for production-ready applications
Alexey Grigorev
Java is the most popular programming language, according to the TIOBE index, and it is a typical choice for running production systems in many companies, both in the startup world and among large enterprises.Not surprisingly, it is also a common choice for creating data science applications: it is fast and has a great set of data processing tools, both built-in and external. What is more, choosing Java for data science allows you to easily integrate solutions with existing software, and bring data science into production with less effort.This book will teach you how to create data science applications with Java. First, we will revise the most important things when starting a data science application, and then brush up the basics of Java and machine learning before diving into more advanced topics. We start by going over the existing libraries for data processing and libraries with machine learning algorithms. After that, we cover topics such as classification and regression, dimensionality reduction and clustering, information retrieval and natural language processing, and deep learning and big data.Finally, we finish the book by talking about the ways to deploy the model and evaluate it in production settings.