记一次cpu过高的故障排除

一次 nginx 的配置解决问题
更新于: 2025-02-12 22:46:05

背景

  • google 后台反应近1周抓取失败次数比较多
  • 查看服务器,发现 CPU 使用很多,高的时候,占80%多了
发现问题
将这些有问题的屏蔽掉
大概正常的情况

创建 ban-spider.conf 文件

map $http_user_agent $blocked_ua {
    default 0;
    ~*(MegaIndex|MegaIndex.ru|BLEXBot|Qwantify|qwantify|semrush|Semrush|serpstatbot|hubspot|python|Go-http-client|Java|PhantomJS|SemrushBot|Scrapy|Webdup|AcoonBot|AhrefsBot|Ezooms|EdisterBot|EC2LinkFinder|jikespider|Purebot|MJ12bot|WangIDSpider|WBSearchBot|Wotbox|xbfMozilla|Yottaa|YandexBot|Jorgee|SWEBot|spbot|TurnitinBot-Agent|mail.RU|perl|Python|Wget|Xenu|ZmEu|SeznamBot|Curl|HttpClient|Crawler|crawler|Nimbostratus-Bot|MRA58N|LMY47V|python-requests|ChatGLM-Spider|Amazonbot|Web-Crawler|GPTBot) 1;
}

使用 ban-spider.conf 里的变量

upstream docify-rails {
  server 127.0.0.1:3002;
}

# NGINX Server Instance
server {
  listen 0.0.0.0:80;
  listen 443 ssl;
  // ....

  if ($blocked_ua) {
    return 403;
  }

  if ($request_uri ~* \.php) {
    return 410;
  }
}

测试抓取

curl -I -A 'Baiduspider' www.test.com