记一次cpu过高的故障排除
          
            一次 nginx 的配置解决问题
          
          
        
        背景
- google后台反应近1周抓取失败次数比较多
- 查看服务器,发现 CPU使用很多,高的时候,占80%多了
 发现问题
发现问题 将这些有问题的屏蔽掉
将这些有问题的屏蔽掉 大概正常的情况
大概正常的情况创建 ban-spider.conf 文件
map $http_user_agent $blocked_ua {
    default 0;
    ~*(MegaIndex|MegaIndex.ru|BLEXBot|Qwantify|qwantify|semrush|Semrush|serpstatbot|hubspot|python|Go-http-client|Java|PhantomJS|SemrushBot|Scrapy|Webdup|AcoonBot|AhrefsBot|Ezooms|EdisterBot|EC2LinkFinder|jikespider|Purebot|MJ12bot|WangIDSpider|WBSearchBot|Wotbox|xbfMozilla|Yottaa|YandexBot|Jorgee|SWEBot|spbot|TurnitinBot-Agent|mail.RU|perl|Python|Wget|Xenu|ZmEu|SeznamBot|Curl|HttpClient|Crawler|crawler|Nimbostratus-Bot|MRA58N|LMY47V|python-requests|ChatGLM-Spider|Amazonbot|Web-Crawler|GPTBot) 1;
}
使用 ban-spider.conf 里的变量
upstream docify-rails {
  server 127.0.0.1:3002;
}
# NGINX Server Instance
server {
  listen 0.0.0.0:80;
  listen 443 ssl;
  // ....
  if ($blocked_ua) {
    return 403;
  }
  if ($request_uri ~* \.php) {
    return 410;
  }
}
测试抓取
curl -I -A 'Baiduspider' www.test.com
直接屏蔽IP
iptables -A INPUT -s <IP地址> -j DROP