Ruby 采集程序:单线程/多线程
利用 MySQL/Sqlite3+Ruby 实现一个简单的多线程采集程序
单线程
适用于数量不多,测试性的场景,数据 库可以用 sqlite3/mysql
namespace :spider do
desc "fetch fasta content"
task :fasta do
records = Post.where(grabbed: false)
records.each do |record|
url = record.fasta
if url.start_with? "http"
response = RestClient.get record.fasta
record.fasta = response.body
record.save
end
end
end
end
多线程
数据量大,MySQL
namespace :spider do
desc "fetch fasta content with thread"
task :fasta_thread do
threads = []
records = Post.where(grabbed: false)
records.each_slice(50) do |batch|
batch.each do |record|
thread = Thread.new {
url = record.fasta
if url.start_with? "http"
response = RestClient.get record.fasta
record.fasta = response.body
record.save
end
}
threads.push(thread)
end
end
threads.each { |t| t.join }
end
end