OSS編～Jubatusを試してみる②～

投稿日 2014年5月21日
著者 aws-recipe-user
カテゴリーコンピューティング

こんにちは！ JQです。

前回は『OSS編～Jubatusを試してみる①～』と題して、オンライン機械学習向けの分散処理フレームワークのJubatusを試してみました。
今回は『OSS編～Jubatusを試してみる②～』と題して、1台のインスタンス上でJubatusの分散処理を試してみたいと思います。

JubatusではZookeeperを利用して分散処理を行うことが出来ます。

Zookeeperとは

分散アプリケーションのためのパフォーマンスの高いコーディネーションサービスです。

Zookeeoer
1.Zookeeperインストール
先ずはZookeeperをインストールします。
Apache Zookeeperのサイトからダウンロードします。

$ sudo wget http://ftp.kddilabs.jp/infosystems/apache/zookeeper/stable/zookeeper-3.4.6.tar.gz

1	$ sudo wget http://ftp.kddilabs.jp/infosystems/apache/zookeeper/stable/zookeeper-3.4.6.tar.gz

$ tar xvzf zookeeper-3.4.6.tar.gz

1	$ tar xvzf zookeeper-3.4.6.tar.gz

ZookeeperではJavaを利用するため、インストールしておきます。

$ sudo apt-get install openjdk-7-jre

1	$ sudo apt-get install openjdk-7-jre

2.設定ファイルの作成
続いてZookeeperの設定ファイルを作成します。

サンプルファイルからコピーします。

$ cp zookeeper-3.4.6/conf/zoo_sample.cfg zookeeper-3.4.6/conf/zoo.cfg

1	$ cp zookeeper-3.4.6/conf/zoo_sample.cfg zookeeper-3.4.6/conf/zoo.cfg

中身は以下のようになります。

# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/tmp/zookeeper
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1

# The number of milliseconds of each tick

tickTime=2000

# The number of ticks that the initial

# synchronization phase can take

initLimit=10

# The number of ticks that can pass between

# sending a request and getting an acknowledgement

syncLimit=5

# the directory where the snapshot is stored.

# do not use /tmp for storage, /tmp here is just

# example sakes.

dataDir=/tmp/zookeeper

# the port at which the clients will connect

clientPort=2181

# the maximum number of client connections.

# increase this if you need to handle more clients

#maxClientCnxns=60

# Be sure to read the maintenance section of the

# administrator guide before turning on autopurge.

# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance

# The number of snapshots to retain in dataDir

#autopurge.snapRetainCount=3

# Purge task interval in hours

# Set to "0" to disable auto purge feature

#autopurge.purgeInterval=1

3.Zookeeperの起動
Zookeeperを起動します。

$ /home/ubuntu/zookeeper-3.4.6/bin/zkServer.sh start
JMX enabled by default
Using config: /home/ubuntu/zookeeper-3.4.6/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED

$ /home/ubuntu/zookeeper-3.4.6/bin/zkServer.sh start

JMX enabled by default

Using config: /home/ubuntu/zookeeper-3.4.6/bin/../conf/zoo.cfg

Starting zookeeper ... STARTED

4.Jubatusの設定ファイルをZookeeperに設定
続いてJubatusの設定ファイルをZookeeperに設定します。

$ jubaconfig --cmd write --zookeeper=localhost:2181 --file shogun.json --name shogun --type classifier

1	$ jubaconfig --cmd write --zookeeper=localhost:2181 --file shogun.json --name shogun --type classifier

5.Jubatus Proxyの起動
Jubatus Proxyを起動します。
今回はclassifierのサンプルプログラムを利用するため、jubaclassifier_proxyで起動します。

$ jubaclassifier_proxy --zookeeper=localhost:2181 --rpc-port=9198 &

1	$ jubaclassifier_proxy --zookeeper=localhost:2181 --rpc-port=9198 &

6.Jubatusサーバの起動
Jubatusサーバのプロセスを複数起動します。
–nameで同じクラスタとして扱います。

$ jubaclassifier --rpc-port=9180 --name=shogun --zookeeper=localhost:2181 &
$ jubaclassifier --rpc-port=9181 --name=shogun --zookeeper=localhost:2181 &
$ jubaclassifier --rpc-port=9182 --name=shogun --zookeeper=localhost:2181 &

$ jubaclassifier --rpc-port=9180 --name=shogun --zookeeper=localhost:2181 &

$ jubaclassifier --rpc-port=9181 --name=shogun --zookeeper=localhost:2181 &

$ jubaclassifier --rpc-port=9182 --name=shogun --zookeeper=localhost:2181 &

確認してみます。

$ sudo /home/ubuntu/zookeeper-3.4.6/bin/zkCli.sh -server localhost:2181
[zk: localhost:2181(CONNECTED) 0] ls /jubatus/actors/classifier/shogun/nodes
[10.185.62.12_9181, 10.185.62.12_9182, 10.185.62.12_9180]

$ sudo /home/ubuntu/zookeeper-3.4.6/bin/zkCli.sh -server localhost:2181

[zk: localhost:2181(CONNECTED) 0] ls /jubatus/actors/classifier/shogun/nodes

[10.185.62.12_9181, 10.185.62.12_9182, 10.185.62.12_9180]

7.プログラムの実行
今回はサンプルプログラムのShogunを利用しています。
クライアントプログラムのportをProxyのポートにnameをクラスタ名に修正して実行します。

#!/usr/bin/env python
# coding: utf-8

host = '127.0.0.1'
port = 9198
name = 'shogun'


$ python shogun.py
徳川 慶喜
足利 義昭
北条 守時

#!/usr/bin/env python

# coding: utf-8

host = '127.0.0.1'

port = 9198

name = 'shogun'

$ python shogun.py

徳川慶喜

足利義昭

北条守時

いかがでしたでしょうか？
次回は『OSS編～Jubatusを試してみる③～』と題して、Jubatusを複数台のインスタンスで分散処理を試してみたいと思います。
お楽しみに！！！

この記事を書いた人

aws-recipe-user

記事一覧

OSS編～Jubatusを試してみる②～

Zookeeperとは

この記事を書いた人

aws-recipe-user

Trusted Advisor編～AWSサービス監視～

OSS編～Nagios for AWS Windows①～

EC2編～Amazon Linux Python Update 2.7～

Amazon EC2編～IAM Roles for EC2インスタンスを立ちあげ…