Amazon Elastic MapReduceでperlを使った処理をしてみる（その2）

前回でperlで書いた処理をAmazon Elastic MapReduce（略称はなんだろ）に食わせるのができたので、
次はCPANモジュールを使いたい。
というときにやっぱり便利なのがlocal::lib。
とりあえず素のdebianに一般ユーザーを作ってそこでlocal:libを使って~/perl5に必要なCPANモジュールを集めた。
これをjarにまとめる “jar cvf perl5.jar -C perl5 .”
これをS3にアップロード。
具体的にはApp::Hacheroというログ集計アプリケーションを動かしたかったので、これが入ってる。
mapper.yml, reducer.ymlの各設定ファイルもS3にアップロード。
以下のように適宜pathを通してやって実行。

[danjou@sylvia] $ ./elastic-mapreduce --create \

    - stream \

    - input s3n://lopnor/hadoop/input \

    - output s3n://lopnor/hadoop/output \

    - mapper '/usr/bin/perl -Iperl5/lib/perl5 -Iperl5/lib/perl5/i486-linux-gnu-thread-multi perl5/bin/hachero.pl -c mapper.yml' \

    - reducer '/usr/bin/perl -Iperl5/lib/perl5 -Iperl5/lib/perl5/i486-linux-gnu-thread-multi perl5/bin/hachero.pl -c reducer.yml' \

    - num-instances 1 \

    - cache-archive s3n://lopnor/hadoop/lib/perl5.jar#perl5 \

    - cache s3n://lopnor/hadoop/config/mapper.yml#mapper.yml \

    - cache s3n://lopnor/hadoop/config/reducer.yml#reducer.yml

できた！！
- outputディレクトリが既にあるとエラーになってお金がちゃりーん。
- pathが通ってないとlog以下のstderrにそれなりのメッセージが出る。
つぎはs3fsに挑戦ですな。
これまでと続編もどうぞ。
- その1
- その3