E-book details

Pig Design Patterns. Simplify Hadoop programming to create complex end-to-end Enterprise Big Data solutions with Pig

Pig Design Patterns. Simplify Hadoop programming to create complex end-to-end Enterprise Big Data solutions with Pig

Pradeep Pasupuleti

Ebook
  • Pig Design Patterns
    • Table of Contents
    • Pig Design Patterns
    • Credits
    • Foreword
    • About the Author
    • Acknowledgments
    • About the Reviewers
    • www.PacktPub.com
      • Support files, eBooks, discount offers and more
        • Why Subscribe?
        • Free Access for Packt account holders
    • Preface
      • What this book covers
        • Motivation for this book
      • What you need for this book
      • Who this book is for
      • Conventions
      • Reader feedback
      • Customer support
        • Downloading the example code
          • Third-party libraries
          • Datasets
        • Errata
        • Piracy
        • Questions
    • 1. Setting the Context for Design Patterns in Pig
      • Understanding design patterns
      • The scope of design patterns in Pig
      • Hadoop demystified a quick reckoner
        • The enterprise context
        • Common challenges of distributed systems
        • The advent of Hadoop
        • Hadoop under the covers
        • Understanding the Hadoop Distributed File System
          • HDFS design goals
          • Working of HDFS
        • Understanding MapReduce
          • Understanding how MapReduce works
          • The MapReduce internals
      • Pig a quick intro
        • Understanding the rationale of Pig
        • Understanding the relevance of Pig in the enterprise
        • Working of Pig an overview
          • Firing up Pig
          • The use case
          • Code listing
          • The dataset
      • Understanding Pig through the code
        • Pigs extensibility
        • Operators used in code
        • The EXPLAIN operator
        • Understanding Pig's data model
          • Primitive types
          • Complex types
            • The relevance of schemas
      • Summary
    • 2. Data Ingest and Egress Patterns
      • The context of data ingest and egress
      • Types of data in the enterprise
      • Ingest and egress patterns for multistructured data
        • Considerations for log ingestion
          • The Apache log ingestion pattern
          • Background
          • Motivation
          • Use cases
          • Pattern implementation
          • Code snippets
            • Code for the CommonLogLoader class
            • Code for the CombinedLogLoader class
          • Results
          • Additional information
        • The Custom log ingestion pattern
          • Background
          • Motivation
          • Use cases
          • Pattern implementation
          • Code snippets
          • Results
          • Additional information
        • The image ingress and egress pattern
          • Background
          • Motivation
          • Use cases
          • Pattern implementation
            • The image Ingress Implementation
            • The image egress implementation
          • Code snippets
            • The image ingress
              • Pig script
              • Image to a sequence UDF snippet
            • The image egress
              • Pig script
              • Sequence to an image UDF
          • Results
          • Additional information
      • The ingress and egress patterns for the NoSQL data
        • MongoDB ingress and egress patterns
          • Background
          • Motivation
          • Use cases
          • Pattern implementation
            • The ingress implementation
            • The egress implementation
          • Code snippets
            • The ingress code
            • The egress code
          • Results
          • Additional information
        • The HBase ingress and egress pattern
          • Background
          • Motivation
          • Use cases
          • Pattern implementation
            • The ingress implementation
            • The egress implementation
          • Code snippets
            • The ingress code
            • The egress code
          • Results
          • Additional information
      • The ingress and egress patterns for structured data
        • The Hive ingress and egress patterns
          • Background
          • Motivation
          • Use cases
          • Pattern implementation
            • The ingress implementation
            • The egress implementation
          • Code snippets
            • The ingress Code
              • Importing data using RCFile
              • Importing data using HCatalog
            • The egress code
          • Results
          • Additional information
      • The ingress and egress patterns for semi-structured data
        • The mainframe ingestion pattern
          • Background
          • Motivation
          • Use cases
          • Pattern implementation
          • Code snippets
          • Results
          • Additional information
        • XML ingest and egress patterns
          • Background
          • Motivation
            • Motivation for ingesting raw XML
            • Motivation for ingesting binary XML
            • Motivation for egression of XML
          • Use cases
          • Pattern implementation
            • The implementation of the XML raw ingestion
            • The implementation of the XML binary ingestion
        • Code snippets
          • The XML raw ingestion code
          • The XML binary ingestion code
          • The XML egress code
            • Pig script
            • The XML storage
          • Results
          • Additional information
      • JSON ingress and egress patterns
        • Background
          • Motivation
          • Use cases
          • Pattern implementation
            • The ingress implementation
            • The egress implementation
          • Code snippets
            • The ingress code
              • The code for simple JSON
              • The code for nested JSON
            • The egress code
          • Results
          • Additional information
      • Summary
    • 3. Data Profiling Patterns
      • Data profiling for Big Data
        • Big Data profiling dimensions
        • Sampling considerations for profiling Big Data
          • Sampling support in Pig
      • Rationale for using Pig in data profiling
      • The data type inference pattern
        • Background
        • Motivation
        • Use cases
        • Pattern implementation
        • Code snippets
          • Pig script
          • Java UDF
        • Results
        • Additional information
      • The basic statistical profiling pattern
        • Background
        • Motivation
        • Use cases
        • Pattern implementation
        • Code snippets
          • Pig script
          • Macro
        • Results
        • Additional information
      • The pattern-matching pattern
        • Background
        • Motivation
        • Use cases
        • Pattern implementation
        • Code snippets
          • Pig script
          • Macro
        • Results
        • Additional information
      • The string profiling pattern
        • Background
        • Motivation
        • Use cases
        • Pattern implementation
        • Code snippets
          • Pig script
          • Macro
        • Results
        • Additional information
      • The unstructured text profiling pattern
        • Background
        • Motivation
        • Use cases
        • Pattern implementation
        • Code snippets
          • Pig script
          • Java UDF for stemming
          • Java UDF for generating TF-IDF
        • Results
        • Additional information
      • Summary
    • 4. Data Validation and Cleansing Patterns
      • Data validation and cleansing for Big Data
      • Choosing Pig for validation and cleansing
      • The constraint validation and cleansing design pattern
        • Background
        • Motivation
        • Use cases
        • Pattern implementation
        • Code snippets
        • Results
        • Additional information
      • The regex validation and cleansing design pattern
        • Background
        • Motivation
        • Use cases
        • Pattern implementation
        • Code snippets
        • Results
        • Additional information
      • The corrupt data validation and cleansing design pattern
        • Background
        • Motivation
        • Use cases
        • Pattern implementation
        • Code snippets
        • Results
        • Additional information
      • The unstructured text data validation and cleansing design pattern
        • Background
        • Motivation
        • Use cases
        • Pattern implementation
        • Code snippets
        • Results
        • Additional information
      • Summary
    • 5. Data Transformation Patterns
      • Data transformation processes
      • The structured-to-hierarchical transformation pattern
        • Background
        • Motivation
        • Use cases
        • Pattern implementation
        • Code snippets
        • Results
        • Additional information
      • The data normalization pattern
        • Background
        • Motivation
        • Use cases
        • Pattern implementation
        • Code snippets
        • Results
        • Additional information
      • The data integration pattern
        • Background
        • Motivation
        • Use cases
        • Pattern implementation
        • Code snippets
        • Results
        • Additional information
      • The aggregation pattern
        • Background
        • Motivation
        • Use cases
        • Pattern implementation
        • Code snippets
        • Results
        • Additional information
      • The data generalization pattern
        • Background
        • Motivation
        • Use cases
        • Pattern implementation
        • Code snippets
        • Results
        • Additional information
      • Summary
    • 6. Understanding Data Reduction Patterns
      • Data reduction a quick introduction
      • Data reduction considerations for Big Data
      • Dimensionality reduction the Principal Component Analysis design pattern
        • Background
        • Motivation
        • Use cases
        • Pattern implementation
          • Limitations of PCA implementation
        • Code snippets
        • Results
        • Additional information
      • Numerosity reduction the histogram design pattern
        • Background
        • Motivation
        • Use cases
        • Pattern implementation
        • Code snippets
        • Results
        • Additional information
      • Numerosity reduction sampling design pattern
        • Background
        • Motivation
        • Use cases
        • Pattern implementation
        • Code snippets
        • Results
        • Additional information
      • Numerosity reduction clustering design pattern
        • Background
        • Motivation
        • Use cases
        • Pattern implementation
        • Code snippets
        • Results
        • Additional information
      • Summary
    • 7. Advanced Patterns and Future Work
      • The clustering pattern
        • Background
        • Motivation
        • Use cases
        • Pattern implementation
        • Code snippets
        • Results
        • Additional information
      • The topic discovery pattern
        • Background
        • Motivation
        • Use cases
        • Pattern implementation
        • Code snippets
        • Results
        • Additional information
      • The natural language processing pattern
        • Background
        • Motivation
        • Use cases
        • Pattern implementation
        • Code snippets
        • Results
        • Additional information
      • The classification pattern
        • Background
        • Motivation
        • Use cases
        • Pattern implementation
        • Code snippets
        • Results
        • Additional information
      • Future trends
        • Emergence of data-driven patterns
        • The emergence of solution-driven patterns
        • Patterns addressing programmability constraints
      • Summary
    • Index
  • Title: Pig Design Patterns. Simplify Hadoop programming to create complex end-to-end Enterprise Big Data solutions with Pig
  • Author: Pradeep Pasupuleti
  • Original title: Pig Design Patterns. Simplify Hadoop programming to create complex end-to-end Enterprise Big Data solutions with Pig.
  • ISBN: 9781783285563, 9781783285563
  • Date of issue: 2014-04-17
  • Format: Ebook
  • Item ID: e_3bdv
  • Publisher: Packt Publishing