Programowanie
Data science od podstaw. Analiza danych w Pythonie. Wydanie II
Joel Grus
Analityka danych jest uważana za wyjątkowo obiecującą dziedzinę wiedzy. Rozwija się błyskawicznie i znajduje coraz to nowsze zastosowania. Profesjonaliści biegli w eksploracji danych i wydobywaniu z nich pożytecznych informacji mogą liczyć na interesującą pracę i bardzo atrakcyjne warunki zatrudnienia. Jednak aby zostać analitykiem danych, trzeba znać matematykę i statystykę, a także nauczyć się programowania. Umiejętności w zakresie uczenia maszynowego i uczenia głębokiego również są ważne. W przypadku tak specyficznej dziedziny, jaką jest nauka o danych, szczególnie istotne jest zdobycie gruntownych podstaw i dogłębne ich zrozumienie. W tym przewodniku opisano zagadnienia związane z podstawami nauki o danych. Wyjaśniono niezbędne elementy matematyki i statystyki. Przedstawiono także techniki budowy potrzebnych narzędzi i sposoby działania najistotniejszych algorytmów. Książka została skonstruowana tak, aby poszczególne implementacje były jak najbardziej przejrzyste i zrozumiałe. Zamieszczone tu przykłady napisano w Pythonie: jest to język dość łatwy do nauki, a pracę na danych ułatwia szereg przydatnych bibliotek Pythona. W drugim wydaniu znalazły się nowe tematy, takie jak uczenie głębokie, statystyka i przetwarzanie języka naturalnego, a także działania na ogromnych zbiorach danych. Zagadnienia te często pojawiają się w pracy współczesnego analityka danych. W książce między innymi: elementy algebry liniowej, statystyki i rachunku prawdopodobieństwa zbieranie, oczyszczanie i eksploracja danych algorytmy modeli analizy danych podstawy uczenia maszynowego systemy rekomendacji i przetwarzanie języka naturalnego analiza sieci społecznościowych i algorytm MapReduce Nauka o danych: bazuj na solidnych podstawach!
Data Science. Programowanie, analiza i wizualizacja danych z wykorzystaniem języka R
Michael Freeman, Joel Ross
Aby surowe dane przekuć w gotową do wykorzystania wiedzę, potrzebna jest umiejętność ich analizy, przekształcania i niekiedy również wizualizacji. Nagrodą za włożony w to wysiłek jest lepsze rozumienie różnych złożonych zagadnień z wielu dziedzin wiedzy. Co więcej, znajomość procesów programowego przetwarzania danych pozwala na szybkie wykrywanie i opisywanie wzorców danych, praktycznie niemożliwych do dostrzeżenia innymi technikami. Dla wielu badaczy jednak barierą na drodze do skorzystania z tych atrakcyjnych możliwości jest konieczność pisania kodu. Oto podręcznik programowania w języku R dla analityków danych, szczególnie przydatny dla osób, które nie mają doświadczenia w tej dziedzinie. Dokładnie opisano tu potrzebne narzędzia i technologie. Zamieszczono wskazówki dotyczące instalacji i konfiguracji oprogramowania do pisania kodu, wykonywania go i zarządzania nim, a także śledzenia wersji projektów i zmian w nich oraz korzystania z innych podstawowych mechanizmów. Poszczególne kroki tworzenia kodu w języku R wyjaśniono dokładnie i przystępnie. Dzięki tej książce można płynnie przejść do konkretnych zadań i budować potrzebne aplikacje. Zrozumienie prezentowanych w niej treści ułatwiają liczne przykłady i ćwiczenia, co pozwala szybko przystąpić do skutecznego analizowania własnych zbiorów danych. W tej książce między innymi: przygotowanie środowiska pracy i rozpoczęcie programowania w R podstawy zarządzania projektami, kontrola wersji i generowanie dokumentacji ramki danych, pakiety dplyr i tidyr kod do wizualizacji danych i pakiet ggplot2 tworzenie aplikacji i techniki współpracy w zespołach specjalistów Po prostu R i dane. Wyciśniesz każdą kroplę wiedzy!
Stephen Klosterman
If data is the new oil, then machine learning is the drill. As companies gain access to ever-increasing quantities of raw data, the ability to deliver state-of-the-art predictive models that support business decision-making becomes more and more valuable.In this book, you’ll work on an end-to-end project based around a realistic data set and split up into bite-sized practical exercises. This creates a case-study approach that simulates the working conditions you’ll experience in real-world data science projects.You’ll learn how to use key Python packages, including pandas, Matplotlib, and scikit-learn, and master the process of data exploration and data processing, before moving on to fitting, evaluating, and tuning algorithms such as regularized logistic regression and random forest. Now in its second edition, this book will take you through the end-to-end process of exploring data and delivering machine learning models. Updated for 2021, this edition includes brand new content on XGBoost, SHAP values, algorithmic fairness, and the ethical concerns of deploying a model in the real world.By the end of this data science book, you’ll have the skills, understanding, and confidence to build your own machine learning models and gain insights from real data.
Stephen Klosterman
Data Science Projects with Python is designed to give you practical guidance on industry-standard data analysis and machine learning tools, by applying them to realistic data problems. You will learn how to use pandas and Matplotlib to critically examine datasets with summary statistics and graphs, and extract the insights you seek to derive. You will build your knowledge as you prepare data using the scikit-learn package and feed it to machine learning algorithms such as regularized logistic regression and random forest. You’ll discover how to tune algorithms to provide the most accurate predictions on new and unseen data. As you progress, you’ll gain insights into the working and output of these algorithms, building your understanding of both the predictive capabilities of the models and why they make these predictions.By then end of this book, you will have the necessary skills to confidently use machine learning algorithms to perform detailed data analysis and extract meaningful insights from unstructured data.
Matt Eland
As the fields of data science, machine learning, and artificial intelligence rapidly evolve, .NET developers are eager to leverage their expertise to dive into these exciting domains but are often unsure of how to do so. Data Science in .NET with Polyglot Notebooks is the practical guide you need to seamlessly bring your .NET skills into the world of analytics and AI. With Microsoft’s .NET platform now robustly supporting machine learning and AI tasks, the introduction of tools such as .NET Interactive kernels and Polyglot Notebooks has opened up a world of possibilities for .NET developers. This book empowers you to harness the full potential of these cutting-edge technologies, guiding you through hands-on experiments that illustrate key concepts and principles. Through a series of interactive notebooks, you’ll not only master technical processes but also discover how to integrate these new skills into your current role or pivot to exciting opportunities in the data science field. By the end of the book, you’ll have acquired the necessary knowledge and confidence to apply cutting-edge data science techniques and deliver impactful solutions within the .NET ecosystem.
Matt Eland
As the fields of data science, machine learning, and artificial intelligence rapidly evolve, .NET developers are eager to leverage their expertise to dive into these exciting domains but are often unsure of how to do so. Data Science in .NET with Polyglot Notebooks is the practical guide you need to seamlessly bring your .NET skills into the world of analytics and AI. With Microsoft’s .NET platform now robustly supporting machine learning and AI tasks, the introduction of tools such as .NET Interactive kernels and Polyglot Notebooks has opened up a world of possibilities for .NET developers. This book empowers you to harness the full potential of these cutting-edge technologies, guiding you through hands-on experiments that illustrate key concepts and principles. Through a series of interactive notebooks, you’ll not only master technical processes but also discover how to integrate these new skills into your current role or pivot to exciting opportunities in the data science field. By the end of the book, you’ll have acquired the necessary knowledge and confidence to apply cutting-edge data science techniques and deliver impactful solutions within the .NET ecosystem.
Data science, wyzwania i rozwiązania. Jak zostać ekspertem analizy danych
Daniel Vaughan
Uczenie się i praktykowanie danologii nie należy do najłatwiejszych zadań. Edukacja w tej dziedzinie zazwyczaj dotyczy programowania i uczenia maszynowego, a przecież świetny analityk danych musi się znać na wielu innych zagadnieniach. Może się ich nauczyć w pracy, ale w tym celu konieczne jest znalezienie mentora. A to niestety nie zawsze jest możliwe. Ten podręcznik zaczyna się tam, gdzie większość książek się kończy - od rzeczywistych procesów decyzyjnych opartych na wnioskach wynikających z danych. Brett Holleman, niezależny danolog Dzięki tej książce przyswoisz różne techniki, które pomogą Ci stać się bardziej produktywnym analitykiem danych. Najpierw zapoznasz się z tematami związanymi z rozumieniem danych i umiejętnościami miękkimi, które okazują się konieczne w pracy dobrego danologa. Dopiero potem skupisz się na kluczowych aspektach uczenia maszynowego. W ten sposób stopniowo przejdziesz ścieżkę od przeciętnego kandydata do wyjątkowego specjalisty data science. Umiejętności opisane w tym przewodniku przez wiele lat były rozpoznawane, katalogowane, analizowane i stosowane do generowania wartości i szkolenia danologów w różnych firmach i branżach. Z książki dowiesz się: jak sprawić, by procesy oparte na analizie danych generowały wartość jak zaprojektować przydatne wskaźniki jak zdobywać poparcie interesariuszy jak się upewnić, że algorytm uczenia maszynowego nadaje się do rozwiązania danego zadania jak zapanować nad wyciekami danych Oto brakujący podręcznik pozwalający odnieść sukces komercyjny dzięki data science! Adri Purkayastha, dyrektor do spraw zagrożeń związanych z AI, BNP Paribas
Data Structures and Algorithms with the C++ STL. A guide for modern C++ practitioners
John Farrier
While the Standard Template Library (STL) offers a rich set of tools for data structures and algorithms, navigating its intricacies can be daunting for intermediate C++ developers without expert guidance. This book offers a thorough exploration of the STL’s components, covering fundamental data structures, advanced algorithms, and concurrency features.Starting with an in-depth analysis of the std::vector, this book highlights its pivotal role in the STL, progressing toward building your proficiency in utilizing vectors, managing memory, and leveraging iterators. The book then advances to STL’s data structures, including sequence containers, associative containers, and unordered containers, simplifying the concepts of container adaptors and views to enhance your knowledge of modern STL programming. Shifting the focus to STL algorithms, you’ll get to grips with sorting, searching, and transformations and develop the skills to implement and modify algorithms with best practices. Advanced sections cover extending the STL with custom types and algorithms, as well as concurrency features, exception safety, and parallel algorithms.By the end of this book, you’ll have transformed into a proficient STL practitioner ready to tackle real-world challenges and build efficient and scalable C++ applications.
Mercury Learning and Information, D. Malhotra, N....
This book introduces the fundamentals of data structures using C++ in a self-teaching format. It covers managing large amounts of information, SEO, and creating Internet/Web indexing services. Practical analogies with real-world applications help explain technical concepts. The book includes end-of-chapter exercises such as programming tasks, theoretical questions, and multiple-choice quizzes.The course starts with an introduction to data structures and the C++ language, progressing through arrays, linked lists, queues, searching and sorting, stacks, trees, multi-way search trees, hashing, files, and graphs. Each chapter builds on the previous one, ensuring a comprehensive understanding of data structures.Understanding these concepts is crucial for managing large databases and optimizing web services. This book guides readers from basic to advanced data structure techniques, blending theoretical knowledge with practical skills. Companion files with source code and data sets enhance the learning experience, making this book an essential resource for mastering data structures with C++.
Mercury Learning and Information, D. Malhotra, N....
This book introduces the fundamentals of data structures using Java in a self-teaching format. It covers managing large databases, effective SEO, and creating web indexing services. Real-world analogies help explain technical concepts. Each chapter includes programming tasks, theoretical questions, and multiple-choice quizzes.The course begins with an introduction to data structures and Java, moving through arrays, linked lists, queues, searching and sorting, stacks, trees, multi-way search trees, hashing, files, and graphs. Each chapter builds on the previous one, ensuring a thorough understanding of data structures.Understanding these concepts is crucial for managing information and optimizing web services. This book guides readers from basic to advanced techniques, blending theory with practical skills. It is an essential resource for mastering data structures with Java, enhanced by end-of-chapter exercises and real-world examples.
Mercury Learning and Information, D. Malhotra, N....
This book, part of the Pocket Primer series, introduces the basic concepts of data science using Python 3 and other applications. It offers a fast-paced introduction to data analytics, statistics, data visualization, linear algebra, and regular expressions. The book features numerous code samples using Python, NumPy, R, SQL, NoSQL, and Pandas. Companion files with source code and color figures are available.Understanding data science is crucial in today's data-driven world. This book provides a comprehensive introduction, covering key areas such as Python 3, data visualization, and statistical concepts. The practical code samples and hands-on approach make it ideal for beginners and those looking to enhance their skills.The journey begins with working with data, followed by an introduction to probability, statistics, and linear algebra. It then delves into Python, NumPy, Pandas, R, regular expressions, and SQL/NoSQL, concluding with data visualization techniques. This structured approach ensures a solid foundation in data science.
Data Wrangling on AWS. Clean and organize complex data for analysis
Navnit Shukla, Sankar M, Sam Palani
Data wrangling is the process of cleaning, transforming, and organizing raw, messy, or unstructured data into a structured format. It involves processes such as data cleaning, data integration, data transformation, and data enrichment to ensure that the data is accurate, consistent, and suitable for analysis. Data Wrangling on AWS equips you with the knowledge to reap the full potential of AWS data wrangling tools.First, you’ll be introduced to data wrangling on AWS and will be familiarized with data wrangling services available in AWS. You’ll understand how to work with AWS Glue DataBrew, AWS data wrangler, and AWS Sagemaker. Next, you’ll discover other AWS services like Amazon S3, Redshift, Athena, and Quicksight. Additionally, you’ll explore advanced topics such as performing Pandas data operation with AWS data wrangler, optimizing ML data with AWS SageMaker, building the data warehouse with Glue DataBrew, along with security and monitoring aspects.By the end of this book, you’ll be well-equipped to perform data wrangling using AWS services.
Data Wrangling with Python. Creating actionable data from raw sources
Dr. Tirthajyoti Sarkar , Shubhadeep Roychowdhury
For data to be useful and meaningful, it must be curated and refined. Data Wrangling with Python teaches you the core ideas behind these processes and equips you with knowledge of the most popular tools and techniques in the domain.The book starts with the absolute basics of Python, focusing mainly on data structures. It then delves into the fundamental tools of data wrangling like NumPy and Pandas libraries. You'll explore useful insights into why you should stay away from traditional ways of data cleaning, as done in other languages, and take advantage of the specialized pre-built routines in Python. This combination of Python tips and tricks will also demonstrate how to use the same Python backend and extract/transform data from an array of sources including the Internet, large database vaults, and Excel financial tables. To help you prepare for more challenging scenarios, you'll cover how to handle missing or wrong data, and reformat it based on the requirements from the downstream analytics tool. The book will further help you grasp concepts through real-world examples and datasets.By the end of this book, you will be confident in using a diverse array of sources to extract, clean, transform, and format your data efficiently.
Data Wrangling with SQL. A hands-on guide to manipulating, wrangling, and engineering data using SQL
Raghav Kandarpa, Shivangi Saxena
The amount of data generated continues to grow rapidly, making it increasingly important for businesses to be able to wrangle this data and understand it quickly and efficiently. Although data wrangling can be challenging, with the right tools and techniques you can efficiently handle enormous amounts of unstructured data.The book starts by introducing you to the basics of SQL, focusing on the core principles and techniques of data wrangling. You’ll then explore advanced SQL concepts like aggregate functions, window functions, CTEs, and subqueries that are very popular in the business world. The next set of chapters will walk you through different functions within SQL query that cause delays in data transformation and help you figure out the difference between a good query and bad one. You’ll also learn how data wrangling and data science go hand in hand. The book is filled with datasets and practical examples to help you understand the concepts thoroughly, along with best practices to guide you at every stage of data wrangling.By the end of this book, you’ll be equipped with essential techniques and best practices for data wrangling, and will predominantly learn how to use clean and standardized data models to make informed decisions, helping businesses avoid costly mistakes.
Saba Shah, Rod Waltermann
Spark has become a de facto standard for big data processing. Migrating data processing to Spark saves resources, streamlines your business focus, and modernizes workloads, creating new business opportunities through Spark’s advanced capabilities. Written by a senior solutions architect at Databricks, with experience in leading data science and data engineering teams in Fortune 500s as well as startups, this book is your exhaustive guide to achieving the Databricks Certified Associate Developer for Apache Spark certification on your first attempt.You’ll explore the core components of Apache Spark, its architecture, and its optimization, while familiarizing yourself with the Spark DataFrame API and its components needed for data manipulation. You’ll also find out what Spark streaming is and why it’s important for modern data stacks, before learning about machine learning in Spark and its different use cases. What’s more, you’ll discover sample questions at the end of each section along with two mock exams to help you prepare for the certification exam.By the end of this book, you’ll know what to expect in the exam and gain enough understanding of Spark and its tools to pass the exam. You’ll also be able to apply this knowledge in a real-world setting and take your skillset to the next level.
Alejandro Duarte
Vaadin is an open-source Java framework used to build modern user interfaces. Vaadin 8 simplifies application development and improves user experience. The book begins with an overview of the architecture of Vaadin applications and the way you can organize your code in modules.Then it moves to the more advanced topics about advanced topics such as internationalization, authentication, authorization, and database connectivity. The book also teaches you how to implement CRUD views, how to generate printable reports, and how to manage data with lazy loading.By the end of this book you will be able to architect, implement, and deploy stunning Vaadin applications, and have the knowledge to master web development with Vaadin.
Thomas Kurian Theakanath
Datadog is an essential cloud monitoring and operational analytics tool which enables the monitoring of servers, virtual machines, containers, databases, third-party tools, and application services. IT and DevOps teams can easily leverage Datadog to monitor infrastructure and cloud services, and this book will show you how.The book starts by describing basic monitoring concepts and types of monitoring that are rolled out in a large-scale IT production engineering environment. Moving on, the book covers how standard monitoring features are implemented on the Datadog platform and how they can be rolled out in a real-world production environment. As you advance, you'll discover how Datadog is integrated with popular software components that are used to build cloud platforms. The book also provides details on how to use monitoring standards such as Java Management Extensions (JMX) and StatsD to extend the Datadog platform. Finally, you'll get to grips with monitoring fundamentals, learn how monitoring can be rolled out using Datadog proactively, and find out how to extend and customize the Datadog platform.By the end of this Datadog book, you will have gained the skills needed to monitor your cloud infrastructure and the software applications running on it using Datadog.