Details zum E-Book

Storm Blueprints: Patterns for Distributed Real-time Computation. One of the best ways of getting to grips with the world’s most popular framework for real-time processing is to study real-world projects. This books lets you do just that, resulting in a sound understanding of the fundamentals

Storm Blueprints: Patterns for Distributed Real-time Computation. One of the best ways of getting to grips with the world’s most popular framework for real-time processing is to study real-world projects. This books lets you do just that, resulting in a sound understanding of the fundamentals

P. Taylor Goetz, Peter T Goetz, Brian O'Neill

E-book
  • Storm Blueprints: Patterns for Distributed Real-time Computation
    • Table of Contents
    • Storm Blueprints: Patterns for Distributed Real-time Computation
    • Credits
    • About the Authors
    • About the Reviewers
    • www.PacktPub.com
      • Support files, eBooks, discount offers and more
        • Why Subscribe?
        • Free Access for Packt account holders
    • Preface
      • What this book covers
      • What you need for this book
      • Who this book is for
      • Conventions
      • Reader feedback
      • Customer support
        • Downloading the example code
        • Errata
        • Piracy
        • Questions
    • 1. Distributed Word Count
      • Introducing elements of a Storm topology streams, spouts, and bolts
        • Streams
        • Spouts
        • Bolts
      • Introducing the word count topology data flow
        • Sentence spout
          • Introducing the split sentence bolt
          • Introducing the word count bolt
          • Introducing the report bolt
      • Implementing the word count topology
        • Setting up a development environment
        • Implementing the sentence spout
        • Implementing the split sentence bolt
        • Implementing the word count bolt
        • Implementing the report bolt
        • Implementing the word count topology
      • Introducing parallelism in Storm
        • WordCountTopology parallelism
          • Adding workers to a topology
          • Configuring executors and tasks
      • Understanding stream groupings
      • Guaranteed processing
        • Reliability in spouts
        • Reliability in bolts
        • Reliable word count
      • Summary
    • 2. Configuring Storm Clusters
      • Introducing the anatomy of a Storm cluster
        • Understanding the nimbus daemon
        • Working with the supervisor daemon
        • Introducing Apache ZooKeeper
        • Working with Storms DRPC server
        • Introducing the Storm UI
      • Introducing the Storm technology stack
        • Java and Clojure
        • Python
      • Installing Storm on Linux
        • Installing the base operating system
        • Installing Java
        • ZooKeeper installation
        • Storm installation
        • Running the Storm daemons
        • Configuring Storm
        • Mandatory settings
        • Optional settings
        • The Storm executable
        • Setting up the Storm executable on a workstation
        • The daemon commands
          • Nimbus
          • Supervisor
          • UI
          • DRPC
        • The management commands
          • Jar
          • Kill
          • Deactivate
          • Activate
          • Rebalance
          • Remoteconfvalue
        • Local debug/development commands
          • REPL
          • Classpath
          • Localconfvalue
      • Submitting topologies to a Storm cluster
      • Automating the cluster configuration
      • A rapid introduction to Puppet
        • Puppet manifests
        • Puppet classes and modules
        • Puppet templates
        • Managing environments with Puppet Hiera
        • Introducing Hiera
      • Summary
    • 3. Trident Topologies and Sensor Data
      • Examining our use case
      • Introducing Trident topologies
      • Introducing Trident spouts
      • Introducing Trident operations filters and functions
        • Introducing Trident filters
        • Introducing Trident functions
      • Introducing Trident aggregators Combiners and Reducers
        • CombinerAggregator
        • ReducerAggregator
        • Aggregator
      • Introducing the Trident state
        • The Repeat Transactional state
        • The Opaque state
      • Executing the topology
      • Summary
    • 4. Real-time Trend Analysis
      • Use case
      • Architecture
        • The source application
        • The logback Kafka appender
        • Apache Kafka
        • Kafka spout
        • The XMPP server
      • Installing the required software
        • Installing Kafka
        • Installing OpenFire
      • Introducing the sample application
        • Sending log messages to Kafka
      • Introducing the log analysis topology
        • Kafka spout
        • The JSON project function
        • Calculating a moving average
        • Adding a sliding window
        • Implementing the moving average function
        • Filtering on thresholds
        • Sending notifications with XMPP
      • The final topology
      • Running the log analysis topology
      • Summary
    • 5. Real-time Graph Analysis
      • Use case
      • Architecture
        • The Twitter client
        • Kafka spout
        • A titan-distributed graph database
      • A brief introduction to graph databases
        • Accessing the graph the TinkerPop stack
        • Manipulating the graph with the Blueprints API
        • Manipulating the graph with the Gremlin shell
      • Software installation
        • Titan installation
      • Setting up Titan to use the Cassandra storage backend
        • Installing Cassandra
        • Starting Titan with the Cassandra backend
      • Graph data model
      • Connecting to the Twitter stream
        • Setting up the Twitter4J client
        • The OAuth configuration
          • The TwitterStreamConsumer class
          • The TwitterStatusListener class
      • Twitter graph topology
        • The JSONProjectFunction class
      • Implementing GraphState
        • GraphFactory
        • GraphTupleProcessor
        • GraphStateFactory
        • GraphState
        • GraphUpdater
      • Implementing GraphFactory
      • Implementing GraphTupleProcessor
      • Putting it all together the TwitterGraphTopology class
        • The TwitterGraphTopology class
      • Querying the graph with Gremlin
      • Summary
    • 6. Artificial Intelligence
      • Designing for our use case
      • Establishing the architecture
        • Examining the design challenges
        • Implementing the recursion
          • Accessing the function's return values
          • Immutable tuple field values
          • Upfront field declaration
          • Tuple acknowledgement in recursion
          • Output to multiple streams
          • Read-before-write
        • Solving the challenges
      • Implementing the architecture
        • The data model
        • Examining the recursive topology
        • The queue interaction
        • Functions and filters
        • Examining the Scoring Topology
          • Addressing read-before-write
            • Distributed locking
            • Retry when stale
            • Executing the topology
          • Enumerating the game tree
        • Distributed Remote Procedure Call (DRPC)
          • Remote deployment
      • Summary
    • 7. Integrating Druid for Financial Analytics
      • Use case
      • Integrating a non-transactional system
      • The topology
        • The spout
        • The filter
        • The state design
      • Implementing the architecture
        • DruidState
        • Implementing the StormFirehose object
        • Implementing the partition status in ZooKeeper
      • Executing the implementation
      • Examining the analytics
      • Summary
    • 8. Natural Language Processing
      • Motivating a Lambda architecture
      • Examining our use case
      • Realizing a Lambda architecture
      • Designing the topology for our use case
      • Implementing the design
        • TwitterSpout/TweetEmitter
        • Functions
          • TweetSplitterFunction
          • WordFrequencyFunction
          • PersistenceFunction
      • Examining the analytics
      • Batch processing / historical analysis
      • Hadoop
        • An overview of MapReduce
        • The Druid setup
          • HadoopDruidIndexer
      • Summary
    • 9. Deploying Storm on Hadoop for Advertising Analysis
      • Examining the use case
      • Establishing the architecture
        • Examining HDFS
        • Examining YARN
      • Configuring the infrastructure
        • The Hadoop infrastructure
        • Configuring HDFS
          • Configuring the NameNode
          • Configuring the DataNode
          • Configuring YARN
            • Configuring the ResourceManager
          • Configuring the NodeManager
      • Deploying the analytics
        • Performing a batch analysis with the Pig infrastructure
        • Performing a real-time analysis with the Storm-YARN infrastructure
      • Performing the analytics
        • Executing the batch analysis
        • Executing real-time analysis
      • Deploying the topology
      • Executing the topology
      • Summary
    • 10. Storm in the Cloud
      • Introducing Amazon Elastic Compute Cloud (EC2)
        • Setting up an AWS account
        • The AWS Management Console
          • Creating an SSH key pair
        • Launching an EC2 instance manually
          • Logging in to the EC2 instance
      • Introducing Apache Whirr
        • Installing Whirr
      • Configuring a Storm cluster with Whirr
        • Launching the cluster
      • Introducing Whirr Storm
        • Setting up Whirr Storm
          • Cluster configuration
          • Customizing Storm's configuration
          • Customizing firewall rules
      • Introducing Vagrant
        • Installing Vagrant
        • Launching your first virtual machine
          • The Vagrantfile and shared filesystem
          • Vagrant provisioning
          • Configuring multimachine clusters with Vagrant
      • Creating Storm-provisioning scripts
        • ZooKeeper
        • Storm
        • Supervisord
          • The Storm Vagrantfile
          • Launching the Storm cluster
      • Summary
    • Index
  • Titel: Storm Blueprints: Patterns for Distributed Real-time Computation. One of the best ways of getting to grips with the world’s most popular framework for real-time processing is to study real-world projects. This books lets you do just that, resulting in a sound understanding of the fundamentals
  • Autor: P. Taylor Goetz, Peter T Goetz, Brian O'Neill
  • Originaler Titel: Storm Blueprints: Patterns for Distributed Real-time Computation. One of the best ways of getting to grips with the world’s most popular framework for real-time processing is to study real-world projects. This books lets you do just that, resulting in a sound understanding of the fundamentals.
  • ISBN: 9781782168300, 9781782168300
  • Veröffentlichungsdatum: 2014-03-26
  • Format: E-book
  • Artikelkennung: e_3au2
  • Verleger: Packt Publishing