multithreading - perl, starting more processes at the same time -
i using strawberry perl on windows xp download multiple html pages, want each in variable.
right doing this, see it, gets 1 page @ time:
my $page = `curl -s http://mysite.com/page -m 2`; $page2 = `curl -s http://myothersite.com/page -m 2`;
i looked parallel::forkmanager, couldnt work. tried use windows command start
before curl
doesn't page.
is there more simple way this?
the parallel::forkmanager
module should work you, because uses fork
instead of threads, variables in parent , each of child processses separate , must communicate different way.
this program uses -o
option of curl
save pages in files. file for, say, http://mysite.com/page
saved in file http\mysite.com\page
, can retrieved there parent process.
use strict; use warnings; use parallel::forkmanager; use uri; use file::spec; use file::path 'make_path'; $pm = parallel::forkmanager->new(10); foreach $site (qw( http://mysite.com/page http://myothersite.com/page )) { $pid = $pm->start; next if $pid; fetch($site); $pm->finish; } $pm->wait_all_children; sub fetch { ($url) = @_; $uri = uri->new($url); $filename = file::spec->catfile($uri->scheme, $uri->host, $uri->path); ($vol, $dir, $file) = file::spec->splitpath($filename); make_path $dir; print `curl http://mysite.com/page -m 2 -o $filename`; }
update
here version uses threads
threads::shared
return each page hash shared between threads. hash must marked shared, , locked before modified prevent concurrent access.
use strict; use warnings; use threads; use threads::shared; %pages; @threads; share %pages; foreach $site (qw( http://mysite.com/page http://myothersite.com/page )) { $thread = threads->new('fetch', $site); push @threads, $thread; } $_->join @threads; (scalar keys %pages) { printf "%d %s fetched\n", $_, $_ == 1 ? 'page' : 'pages'; } sub fetch { ($url) = @_; $page = `curl -s $url -m 2`; lock %pages; $pages{$url} = $page; }
Comments
Post a Comment