Scrapy学习: 各种URL

爬虫过程中的各种URL处理
更新于: 2024-07-14 09:52:55

There are mainly three ways to achieve that:

Using urljoin function from urllib:

from urllib.parse import urljoin
# Same as: from w3lib.url import urljoin

url = urljoin(base_url, relative_url)

Using the response's urljoin wrapper method, as mentioned by Steve.

url = response.urljoin(relative_url)

If you also want to yield a request from that link, you can use the handful response's follow method:

# It will create a new request using the above "urljoin" method
yield response.follow(relative_url, callback=self.parse)