mixiの日記閲覧スクリプト

作曲がとんと進まないので、
今日は WWW::MechanizeとWeb::Scraperの使い方練習をしてました。
どちらかというとこのソース書くよりも、

レンタルサーバ上でcpanちゃんと使える環境整えるほうがめんどくさかったです

getarticle.pl

#!/usr/bin/perl
use strict;
use utf8;
binmode(STDOUT,":utf8");
use WWW::Mechanize;
use Web::Scraper;

##setting
my $LOGINID = 'ログイン用のメアド';
my $PASSWORD = 'ログイン用のパスワード';
my $URL_MIXITOP = 'http://www.mixi.jp/';
my $FRIENDID = '日記みたい人のID(数字)';
my $URL_MIXIDIARY = "http://mixi.jp/list_diary.pl?id=$FRIENDID";


my $mech = WWW::Mechanize -> new();
$mech -> get($URL_MIXITOP);
##login
$mech -> submit_form(
  fields => {
    email => $LOGINID,
    password => $PASSWORD
  }
);

##friend diary
$mech -> get($URL_MIXIDIARY);
my $link_num = 0;
my $first_link;
my $scraper = scraper {
        process '#diary_body' , 'diary' => 'HTML';
        result 'diary';
};
foreach ($mech -> find_all_links ('url_regex' => qr/view_diary\.pl[^#]*$/)){
        if($link_num == 0){
                $first_link = $_->url_abs;
        }

        if($link_num != 0 && $first_link eq $_->url_abs){
                $link_num ++;
                next;
        }

        if($_ -> text ne '続きを読む'){
                print "title: " . $_->text . "\n";
                $mech -> get($_ -> url_abs);
                my $diary_text = $scraper->scrape($mech->content);
                print "content: $diary_text\n";

        }
        $link_num ++;
}

実行イメージ

./getarticle.pl
title:日記タイトル
content:記事～～
title:日記タイトル
:
:

これ作るために普段mixiあまり見てない自分が

特定の人にめっちゃあしあと付けまくってますけどあんま気にしないでくださいね　＾。＾

以下参考にしたサイト

-cpan
http://d.hatena.ne.jp/mzt/20080218/p1
http://www.tafworks.com/2009/12/cpanaesrijndael.html
http://d.hatena.ne.jp/ragtarou/20080121
http://www.otsune.com/bsd/tips/usercpaninstall.html

-mechanize,scraper
http://e8y.net/mag/007-www-mechanize/
http://e8y.net/mag/013-web-scraper/