Download - Tutorial Kafka-Storm

Transcript

Sistemas distribuidos escalablesTutorial

Miguel Carcamo VasquezDaniel Wladdimiro Cottet

Profesores: Erika Rosas OlivosNicolas Hidalgo Castillo

Departamento de Ingenierıa InformaticaUniversidad de Santiago de Chile

November, 2014

M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 1 / 31

KafkaWhat is Kafka?

Apache Kafka is publish-subscribe messaging rethought as a distributedcommit log.

• Fast

• Hundreds of megabytes

• Scalable

• Elastically• Transparently

• Durable

• Persisted on disk

M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 2 / 31

KafkaArchitecture

It is a distributed, partitioned, replicated commit log service. It providesthe functionality of a messaging system, but with a unique design.

M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 3 / 31

KafkaArchitecture

A two server Kafka cluster hosting four partitions (P0-P3) with twoconsumer groups. Consumer group A has two consumer instances andgroup B has four.

M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 4 / 31

KafkaZookeper

zookeeperServer.sh

bin/zookeeper-server-start.sh ../config/zookeeper.properties

Configuration

• dataDir

• clientPort

• maxClientCnxns

M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 5 / 31

KafkaKafka Server

kafkaServer.sh

bin/kafka-server-start.sh ../config/server.properties

Mandatory configuration

• broker.id

• log.dirs

• zookeeper.connect

Optional configuration

• Log basics

• num.partition

• Log Retention Policy

• log.retention.hours• log.flush.interval.messages• log.flush.interval.ms

M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 6 / 31

KafkaCreate Topics

createTopics.sh

bin/kafka-topics.sh –create –zookeeper localhost:2181 –replication-factor 1–partitions 1 –topic $1

Parameters

• replication-factor

• partitions

• topic

Configuration –config

• max.message.bytes

• index.interval.bytes

• flush.messages

• flush.ms

M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 7 / 31

KafkaCheck Topics

checkTopics.sh

bin/kafka-topics.sh –list –zookeeper localhost:2181

M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 8 / 31

KafkaProducer

createProducer.sh

bin/kafka-console-producer.sh –broker-list localhost:9092 –topic $1

Mandatory configuration

• metadata.broker.list

• request.required.acks

• producer.type

• serializer.class

Optional configuration

• compression.codec

• request.timeout.ms

M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 9 / 31

KafkaConsumer

createConsumer.sh

bin/kafka-console-consumer.sh –zookeeper localhost:2181 –topic $1–from-beginning

Mandatory configuration

• group.id

• zookeeper.connect

Optional configuration

• fetch.message.max.bytes

• consumer.id

M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 10 / 31

KafkaClients

Producer Daemon StormPython Scala DSLGo (AKA golang) HTTP RESTC JRubyC++ Perl.NET ClojureRuby Node.js

M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 11 / 31

KafkaMulti-Broker

createMultiBroker.sh

cp config/server.properties config/server-1.propertiescp config/server.properties config/server-2.properties

config/server-1.properties:broker.id=1port=9093log.dir=/tmp/kafka-logs-1

config/server-2.properties:broker.id=2port=9094log.dir=/tmp/kafka-logs-2

M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 12 / 31

KafkaCreate Kafka Server

Kafka Server 1

../bin/kafka-server-start.sh config/server-1.properties &

Kafka Server 2

../bin/kafka-server-start.sh config/server-2.properties &

M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 13 / 31

Topic with replication

Create new topic

../bin/kafka-topics.sh –create –zookeeper localhost:2181–replication-factor 3 –partitions 1 –topic my-replicated-topic

Show topic

../bin/kafka-topics.sh –describe –zookeeper localhost:2181 –topicmy-replicated-topic

M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 14 / 31

Fault Tolerance

Kill replication

ps -ef — grep server-1.propertieskill -9 # pid

M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 15 / 31

Storm

What is Storm?

• Computation platform for stream data processing

• Fault Tolerant• Scalable• Distributed• Reliable• Learn, code and run

M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 16 / 31

Architecture

Fig. 1: Storm Cluster

M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 17 / 31

Spouts & Bolts

Fig. 2: Spouts & Bolts

M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 18 / 31

Physical & Logical

Fig. 3: Physical & Logical Architecture

M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 19 / 31

Before coding

• Install maven or graddle

• Install Eclipse (only if you want to)

M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 20 / 31

Coding a Spout

Structure

• import libraries

• public class ”SpoutName” extends BaseRichSpout

• class variables• public void open(Map conf, TopologyContext topologyContext,

SpoutOutCollector collector)• public void nextTuple()• public void declareOutputFields(OutputFields declarer)• Your methods

M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 21 / 31

Coding a Bolt

Structure

• import libraries

• public class ”BoltName” extends BaseRichBolt

• class variables• public ”BoltName”() (Constructor)• public void prepare(Map map, TopologyContext topologyContext,

OutputCollector collector)• public void execute(Tuple input)• public void declareOutputFields(OutputFields declarer)• Your methods

M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 22 / 31

Coding a Topology

Structure

• import libraries

• public class Topology

• class variables• public static void main(String[] args)

• Config config = new Config()• TopologyBuilder b = new TopologyBuilder()• b.setSpout(”SpoutName”, new ”SpoutName”)• b.setBolt(”BoltName”, new

”BoltName”.shuffleGroping(”SpoutName”))• final LocalCluster cluster = new LocalCluster()• cluster.submitTopology(”TopologyName”, config, b.createTopology())

M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 23 / 31

Compile & Run

• Download a Storm release , unpack it, and put the unpacked bin/directory on your PATH.

• cd myapp

• mvn package

• storm jar target/my-app-1.0-SNAPSHOT.jarcom.mycompany.app.App

M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 24 / 31

Grouping

Fig. 4: Groupings

M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 25 / 31

Grouping

• Shuffle: Stream tuples are randomly distributed such that each bolt isguaranteed to get an equal number of tuples.

• Fields: Stream tuples are partitioned by the fields specified in thegrouping.

• All grouping: Stream tuples are replicated across all the bolts.

• Global grouping: entire stream goes to a single bolt.

• Direct Grouping: the source decides which component will receive thetuple.

M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 26 / 31

Project Topology

Fig. 5: Project Topology

M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 27 / 31

Web ServicesNode.js

Install Node.js

https://github.com/joyent/node/archive/master.zip

./configuremakemake install

Run web services

node server.js

M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 28 / 31

KafkaServer Start

Stages

1 zookeeperServer.sh

2 kafkaServer.sh

3 createTopics.sh voteLog

M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 29 / 31

Web ServicesConnection Kafka

Install API Kafka-Python

pip install ./kafka-python

runKafkaLogs.sh

./tail2kafka/tail2kafka -l ../logs/vote-info.log -t voteLog -s localhost -p9092 -d 5

Final stage

createProducer.sh voteLog

M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 30 / 31

Questions?

M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 31 / 31