Scrapy学习: 采集JSON类型的数据/JsonRequest

一种常见的JSON数据爬虫

方法1: 使用 scrapy.Request

import scrapy
import json

# 具体使用的地方
scrapy.Request(url, method='POST', body=json.dumps(xxx))

方法2: 使用 JsonRequest

个人推荐这种方法

import scrapy
from scrapy.http import JsonRequest


class ExampleSpider(scrapy.Spider):
    name = 'example'
    allowed_domains = ['xxx.com']
    # start_urls = ['http://xxx.com/']

    def start_requests(self):
        body = {
            'name': 'kingname',
            'age': 28
        }
        url = 'http://exercise.kingname.info/ajax_1_postbackend'
        yield JsonRequest(url, data=body, callback=self.parse)


    def parse(self, response, *args, **kwargs):
        print(response.body.decode())

关于 JsonRequest

JsonRequest本来就是scrapy.Request的一个子类，所以所有能在scrapy.Request使用的参数，都可以直接在JsonRequest中使用。
同时，它额外支持两个参数，分别是data和dumps_kwargs。
其中data参数的值就是一个可以被json.dumps序列化的对象，例如字典或者列表。
而dumps_kwargs里面的参数，就是 json.dumps支持的那些参数，例如ensure_ascii=False、sort_keys=True等等。
Using the JsonRequest will set the Content-Type header to application/json and Accept header to application/json, text/javascript, */*; q=0.01

方法1: 使用 scrapy.Request

方法2: 使用 JsonRequest

关于 JsonRequest

参考