Big data - E-Books - BIBLIO-Bibliothek | BIBLIO ebookpoint

921

E-BOOK

R Web Scraping Quick Start Guide. Techniques and tools to crawl and scrape data from websites

Olgun Aydin

Web scraping is a technique to extract data from websites. It simulates the behavior of a website user to turn the website itself into a web service to retrieve or introduce new data. This book gives you all you need to get started with scraping web pages using R programming.You will learn about the rules of RegEx and Xpath, key components for scraping website data. We will show you web scraping techniques, methodologies, and frameworks. With this book's guidance, you will become comfortable with the tools to write and test RegEx and XPath rules. We will focus on examples of dynamic websites for scraping data and how to implement the techniques learned. You will learn how to collect URLs and then create XPath rules for your first web scraping script using rvest library. From the data you collect, you will be able to calculate the statistics and create R plots to visualize them. Finally, you will discover how to use Selenium drivers with R for more sophisticated scraping. You will create AWS instances and use R to connect a PostgreSQL database hosted on AWS. By the end of the book, you will be sufficiently confident to create end-to-end web scraping systems using R.

922

E-BOOK

RAG from First Principles. Engineering retrieval-augmented generation systems with Python, LangChain, and LlamaIndex

Jia Huang

Most developers can spin up a RAG pipeline in an afternoon using LangChain or LlamaIndex. Far fewer understand why retrieval fails or how to fix it. This book is for those who want to go deeper.'RAG From First Principles' dismantles the retrieval-augmented generation stack layer by layer, how documents are ingested and parsed, why chunking strategy directly impacts answer quality, how embedding models encode meaning, what happens inside a vector database, and how sparse and dense retrieval interact in a hybrid system. Written by Jia Huang, a research engineer and bestselling AI author, it brings research depth and production experience to one of AI's most critical engineering disciplines.Structured as a progressive dialogue between a seasoned engineer and two students, the book surfaces the questions practitioners actually ask. Each chapter builds on the last, from data import and chunking through embedding selection, index design, hybrid search, and post-retrieval processing, into response generation, evaluation, and advanced paradigms including GraphRAG, Agentic RAG, and Modular RAG.By the end, you'll have the architectural understanding to optimize, debug, and extend your RAG systems with confidence.

923

E-BOOK

Rapid - Apache Mahout Clustering designs. Explore clustering algorithms used with Apache Mahout

Ashish Gupta

As more and more organizations are discovering the use of big data analytics, interest in platforms that provide storage, computation, and analytic capabilities has increased. Apache Mahout caters to this need and paves the way for the implementation of complex algorithms in the field of machine learning to better analyse your data and get useful insights into it.Starting with the introduction of clustering algorithms, this book provides an insight into Apache Mahout and different algorithms it uses for clustering data. It provides a general introduction of the algorithms, such as K-Means, Fuzzy K-Means, StreamingKMeans, and how to use Mahout to cluster your data using a particular algorithm. You will study the different types of clustering and learn how to use Apache Mahout with real world data sets to implement and evaluate your clusters.This book will discuss about cluster improvement and visualization using Mahout APIs and also explore model-based clustering and topic modelling using Dirichlet process. Finally, you will learn how to build and deploy a model for production use.

924

E-BOOK

Raportowanie w System Center Configuration Manager Bez tajemnic

Garth Jones, Dan Toll, Kerrie Meyler

Baza danych SQL Server programu Microsoft System Center Configuration Manager (ConfigMgr) zawiera wiele cennych informacji na temat Twoich użytkowników, komputerów, sprzętu, systemów operacyjnych, aplikacji czy stanu zgodności. Aby umożliwić Ci efektywne wyodrębnianie tych danych, Microsoft dostarczył kilku doskonałych narzędzi, wliczając w to usługi raportowania SQL Server Reporting Services (SSRS) i dodatek SQL Server Data Tools Business Intelligence (SSDT-BI). Podręcznik Raportowanie w System Center Configuration Manager bez tajemnic pokaże Ci, w jaki sposób możesz wykorzystać maksymalny potencjał tych narzędzi. Światowej sławy guru raportowania, Garth Jones, wraz z będącymi ekspertami współautorami tego przewodnika poprowadzi Cię przez wszystkie aspekty niestandardowego raportowania w System Center. Poczynając od instalacji i konfiguracji usług SSRS, krok po kroku nauczysz się wykorzystywać widoki języka SQL do wyszukiwania potrzebnych Ci danych, budować zapytania SQL, tworzyć proste i zaawansowane raporty, a także wykorzystywać administrację opartą na rolach do bezpiecznego dostarczania tych raportów właściwym osobom. W książce tej Jones zebrał aktualne, niezawodne i wszechstronne techniki raportowania w System Center, których na próżno szukać w innych podręcznikach i witrynach internetowych. Korzystając z tego przewodnika będziesz w stanie konsekwentnie pozyskiwać właściwe informacje, które pozwolą Ci rozwiązywać palące problemy i szybko reagować na ewentualne obawy zarządu. Garth Jones, główny architekt w Enhansoft i Microsoft MVP, specjalizuje się w poszerzaniu wartości i znaczenia programu System Center Configuration Manager. Z rodziną produktów System Center pracuje od roku 1996, kiedy to występowała jeszcze pod nazwą SMS. Dan Toll jest administratorem programu Configuration Manager, z którym pracuje od wersji SMS 2003. Specjalizuje się we wdrożeniach systemów operacyjnych dla stacji roboczych i serwerów przy użyciu narzędzi Microsoft Deployment Toolkit (MDT) oraz w raportowaniu w programie ConfigMgr. Kerrie Meyler, Microsoft MVP, jest wiodącą autorką wielu książek z serii System Center Unleashed. Obecnie pracuje jako niezależny konsultant. W czasie trwającej ponad 17 lat kariery zawodowej ewangelizowała produkt SMS na stanowisku starszego specjalisty technologii w Microsoft i prezentowała technologie System Center na konferencjach TechEd i MMS. Szczegółowe informacje na temat Instalowania i konfigurowania usług SSRS pod kątem optymalnego raportowania w System Center i łatwiej-szego rozwiązywania problemów Danych przechowywanych w bazie lokacji programu ConfigMgr Wydajnego pozyskiwania danych programu ConfigMgr poprzez tworzenie zapytań SQL z poziomu SQL Server Management Studio Najlepszych praktyk w zakresie tworzenia i projektowania raportów w System Center Tworzenia szablonów raportów, dostosowywania treści z użyciem parametrów raportów oraz zagnieżdżania wykresów Dostosowywania logo, palet kolorów i pozostałych elementów raportów na potrzeby konkretnej organizacji Konstruowania zaawansowanych metod przeglądania szczegółowego w celu dostarczenia dodatkowych informacji Wzmacniania zabezpieczeń raportów poprzez integrowanie administracji programu ConfigMgr opartej na rolach w zapytaniach SQL Wykorzystywania raportowania do pomiaru kluczowych wskaźników wydajności i pogłębiania wiedzy na temat własnego środowiska Dostosowywania raportów do potrzeb użytkowników końcowych lub zarządu W SIECI: Wszystkie zaprezentowane w tej książce przykłady i skrypty dostępne są do pobrania na stronie informit.com/title/9780672337789

925

E-BOOK

Reactive Programming in Kotlin. Design and build non-blocking, asynchronous Kotlin applications with RXKotlin, Reactor-Kotlin, Android, and Spring

Rivu Chakraborty

In today's app-driven era, when programs are asynchronous, and responsiveness is so vital, reactive programming can help you write code that's more reliable, easier to scale, and better-performing. Reactive programming is revolutionary.With this practical book, Kotlin developers will first learn how to view problems in the reactive way, and then build programs that leverage the best features of this exciting new programming paradigm. You will begin with the general concepts of Reactive programming and then gradually move on to working with asynchronous data streams. You will dive into advanced techniques such as manipulating time in data-flow, customizing operators and provider and how to use the concurrency model to control asynchronicity of code and process event handlers effectively.You will then be introduced to functional reactive programming and will learn to apply FRP in practical use cases in Kotlin. This book will also take you one step forward by introducing you to Spring 5 and Spring Boot 2 using Kotlin. By the end of the book, you will be able to build real-world applications with reactive user interfaces as well as you'll learn to implement reactive programming paradigms in Android.

926

E-BOOK

Real-Time Big Data Analytics. Design, process, and analyze large sets of complex data in real time

Shilpi Saxena

Enterprise has been striving hard to deal with the challenges of data arriving in real time or near real time.Although there are technologies such as Storm and Spark (and many more) that solve the challenges of real-time data, using the appropriate technology/framework for the right business use case is the key to success. This book provides you with the skills required to quickly design, implement and deploy your real-time analytics using real-world examples of big data use cases.From the beginning of the book, we will cover the basics of varied real-time data processing frameworks and technologies. We will discuss and explain the differences between batch and real-time processing in detail, and will also explore the techniques and programming concepts using Apache Storm.Moving on, we’ll familiarize you with “Amazon Kinesis” for real-time data processing on cloud. We will further develop your understanding of real-time analytics through a comprehensive review of Apache Spark along with the high-level architecture and the building blocks of a Spark program. You will learn how to transform your data, get an output from transformations, and persist your results using Spark RDDs, using an interface called Spark SQL to work with Spark.At the end of this book, we will introduce Spark Streaming, the streaming library of Spark, and will walk you through the emerging Lambda Architecture (LA), which provides a hybrid platform for big data processing by combining real-time and precomputed batch data to provide a near real-time view of incoming data.

927

E-BOOK

Recurrent Neural Networks with Python Quick Start Guide. Sequential learning and language modeling with TensorFlow

Simeon Kostadinov

Developers struggle to find an easy-to-follow learning resource for implementing Recurrent Neural Network (RNN) models. RNNs are the state-of-the-art model in deep learning for dealing with sequential data. From language translation to generating captions for an image, RNNs are used to continuously improve results. This book will teach you the fundamentals of RNNs, with example applications in Python and the TensorFlow library. The examples are accompanied by the right combination of theoretical knowledge and real-world implementations of concepts to build a solid foundation of neural network modeling.Your journey starts with the simplest RNN model, where you can grasp the fundamentals. The book then builds on this by proposing more advanced and complex algorithms. We use them to explain how a typical state-of-the-art RNN model works. From generating text to building a language translator, we show how some of today's most powerful AI applications work under the hood.After reading the book, you will be confident with the fundamentals of RNNs, and be ready to pursue further study, along with developing skills in this exciting field.

928

E-BOOK

Redash v5 Quick Start Guide. Create and share interactive dashboards using Redash

Alexander Leibzon, Yael Leibzon

Data exploration and visualization is vital to Business Intelligence, the backbone of almost every enterprise or organization. Redash is a querying and visualization tool developed to simplify how marketing and business development departments are exposed to data. If you want to learn to create interactive dashboards with Redash, explore different visualizations, and share the insights with your peers, then this is the ideal book for you.The book starts with essential Business Intelligence concepts that are at the heart of data visualizations. You will learn how to find your way round Redash and its rich array of data visualization options for building interactive dashboards. You will learn how to create data storytelling and share these with peers. You will see how to connect to different data sources to process complex data, and then visualize this data to reveal valuable insights. By the end of this book, you will be confident with the Redash dashboarding tool to provide insight and communicate data storytelling.