Ruby 采集程序:单线程/多线程

利用 MySQL/Sqlite3+Ruby 实现一个简单的多线程采集程序
更新于: 2021-12-19 12:57:29

单线程

适用于数量不多,测试性的场景,数据 库可以用 sqlite3/mysql

namespace :spider do
  desc "fetch fasta content"
  task :fasta do
    records = Post.where(grabbed: false)
    records.each do |record|
      url = record.fasta
      if url.start_with? "http"
        response = RestClient.get record.fasta
        record.fasta = response.body
        record.save
      end
    end
  end
end

多线程

数据量大,MySQL

namespace :spider do
  desc "fetch fasta content with thread"
  task :fasta_thread do
    threads = []
    records = Post.where(grabbed: false)
    records.each_slice(50) do |batch|
      batch.each do |record|
        thread = Thread.new {
          url = record.fasta
          if url.start_with? "http"
            response = RestClient.get record.fasta
            record.fasta = response.body
            record.save
          end
        }
        threads.push(thread)
      end
    end
    threads.each { |t| t.join }
  end
end

参考

https://www.sitepoint.com/threads-ruby/