Tutorial Kafka-Storm

of 31/31
Sistemas distribuidos escalables Tutorial Miguel C´ arcamo V´ asquez Daniel Wladdimiro Cottet Profesores: Erika Rosas Olivos Nicol´ as Hidalgo Castillo Departamento de Ingenier´ ıa Inform´ atica Universidad de Santiago de Chile November, 2014 M. C´ arcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 1 / 31
  • date post

    15-Jul-2015
  • Category

    Technology

  • view

    160
  • download

    0

Embed Size (px)

Transcript of Tutorial Kafka-Storm

  • Sistemas distribuidos escalablesTutorial

    Miguel Carcamo VasquezDaniel Wladdimiro Cottet

    Profesores: Erika Rosas OlivosNicolas Hidalgo Castillo

    Departamento de Ingeniera InformaticaUniversidad de Santiago de Chile

    November, 2014

    M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 1 / 31

  • KafkaWhat is Kafka?

    Apache Kafka is publish-subscribe messaging rethought as a distributedcommit log.

    Fast Hundreds of megabytes

    Scalable Elastically Transparently

    Durable Persisted on disk

    M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 2 / 31

  • KafkaArchitecture

    It is a distributed, partitioned, replicated commit log service. It providesthe functionality of a messaging system, but with a unique design.

    M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 3 / 31

  • KafkaArchitecture

    A two server Kafka cluster hosting four partitions (P0-P3) with twoconsumer groups. Consumer group A has two consumer instances andgroup B has four.

    M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 4 / 31

  • KafkaZookeper

    zookeeperServer.sh

    bin/zookeeper-server-start.sh ../config/zookeeper.properties

    Configuration

    dataDir clientPort maxClientCnxns

    M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 5 / 31

  • KafkaKafka Server

    kafkaServer.sh

    bin/kafka-server-start.sh ../config/server.properties

    Mandatory configuration

    broker.id log.dirs zookeeper.connect

    Optional configuration

    Log basics num.partition

    Log Retention Policy log.retention.hours log.flush.interval.messages log.flush.interval.ms

    M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 6 / 31

  • KafkaCreate Topics

    createTopics.sh

    bin/kafka-topics.sh create zookeeper localhost:2181 replication-factor 1partitions 1 topic $1

    Parameters

    replication-factor partitions topic

    Configuration config

    max.message.bytes index.interval.bytes flush.messages flush.ms

    M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 7 / 31

  • KafkaCheck Topics

    checkTopics.sh

    bin/kafka-topics.sh list zookeeper localhost:2181

    M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 8 / 31

  • KafkaProducer

    createProducer.sh

    bin/kafka-console-producer.sh broker-list localhost:9092 topic $1

    Mandatory configuration

    metadata.broker.list request.required.acks producer.type serializer.class

    Optional configuration

    compression.codec request.timeout.ms

    M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 9 / 31

  • KafkaConsumer

    createConsumer.sh

    bin/kafka-console-consumer.sh zookeeper localhost:2181 topic $1from-beginning

    Mandatory configuration

    group.id zookeeper.connect

    Optional configuration

    fetch.message.max.bytes consumer.id

    M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 10 / 31

  • KafkaClients

    Producer Daemon StormPython Scala DSLGo (AKA golang) HTTP RESTC JRubyC++ Perl.NET ClojureRuby Node.js

    M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 11 / 31

  • KafkaMulti-Broker

    createMultiBroker.sh

    cp config/server.properties config/server-1.propertiescp config/server.properties config/server-2.properties

    config/server-1.properties:broker.id=1port=9093log.dir=/tmp/kafka-logs-1

    config/server-2.properties:broker.id=2port=9094log.dir=/tmp/kafka-logs-2

    M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 12 / 31

  • KafkaCreate Kafka Server

    Kafka Server 1

    ../bin/kafka-server-start.sh config/server-1.properties &

    Kafka Server 2

    ../bin/kafka-server-start.sh config/server-2.properties &

    M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 13 / 31

  • Topic with replication

    Create new topic

    ../bin/kafka-topics.sh create zookeeper localhost:2181replication-factor 3 partitions 1 topic my-replicated-topic

    Show topic

    ../bin/kafka-topics.sh describe zookeeper localhost:2181 topicmy-replicated-topic

    M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 14 / 31

  • Fault Tolerance

    Kill replication

    ps -ef grep server-1.propertieskill -9 # pid

    M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 15 / 31

  • Storm

    What is Storm?

    Computation platform for stream data processing Fault Tolerant Scalable Distributed Reliable Learn, code and run

    M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 16 / 31

  • Architecture

    Fig. 1: Storm Cluster

    M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 17 / 31

  • Spouts & Bolts

    Fig. 2: Spouts & Bolts

    M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 18 / 31

  • Physical & Logical

    Fig. 3: Physical & Logical Architecture

    M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 19 / 31

  • Before coding

    Install maven or graddle Install Eclipse (only if you want to)

    M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 20 / 31

  • Coding a Spout

    Structure

    import libraries public class SpoutName extends BaseRichSpout

    class variables public void open(Map conf, TopologyContext topologyContext,

    SpoutOutCollector collector) public void nextTuple() public void declareOutputFields(OutputFields declarer) Your methods

    M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 21 / 31

  • Coding a Bolt

    Structure

    import libraries public class BoltName extends BaseRichBolt

    class variables public BoltName() (Constructor) public void prepare(Map map, TopologyContext topologyContext,

    OutputCollector collector) public void execute(Tuple input) public void declareOutputFields(OutputFields declarer) Your methods

    M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 22 / 31

  • Coding a Topology

    Structure

    import libraries public class Topology

    class variables public static void main(String[] args)

    Config config = new Config() TopologyBuilder b = new TopologyBuilder() b.setSpout(SpoutName, new SpoutName) b.setBolt(BoltName, new

    BoltName.shueGroping(SpoutName)) final LocalCluster cluster = new LocalCluster() cluster.submitTopology(TopologyName, config, b.createTopology())

    M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 23 / 31

  • Compile & Run

    Download a Storm release , unpack it, and put the unpacked bin/directory on your PATH.

    cd myapp mvn package storm jar target/my-app-1.0-SNAPSHOT.jar

    com.mycompany.app.App

    M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 24 / 31

  • Grouping

    Fig. 4: Groupings

    M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 25 / 31

  • Grouping

    Shue: Stream tuples are randomly distributed such that each bolt isguaranteed to get an equal number of tuples.

    Fields: Stream tuples are partitioned by the fields specified in thegrouping.

    All grouping: Stream tuples are replicated across all the bolts. Global grouping: entire stream goes to a single bolt. Direct Grouping: the source decides which component will receive the

    tuple.

    M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 26 / 31

  • Project Topology

    Fig. 5: Project Topology

    M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 27 / 31

  • Web ServicesNode.js

    Install Node.js

    https://github.com/joyent/node/archive/master.zip

    ./configuremakemake install

    Run web services

    node server.js

    M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 28 / 31

  • KafkaServer Start

    Stages

    1 zookeeperServer.sh

    2 kafkaServer.sh

    3 createTopics.sh voteLog

    M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 29 / 31

  • Web ServicesConnection Kafka

    Install API Kafka-Python

    pip install ./kafka-python

    runKafkaLogs.sh

    ./tail2kafka/tail2kafka -l ../logs/vote-info.log -t voteLog -s localhost -p9092 -d 5

    Final stage

    createProducer.sh voteLog

    M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 30 / 31

  • Questions?

    M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 31 / 31