How to block bad bots ?

My web server is continuously scanned by internet bots, spiders, web crawlers and they generally are looking into the robot.txt file which URLs are allowed.

But some of them are still scanning your web site even you don’t give them the permission to scan your web site.

Some of the bad bots : AhrefsBot, 360Spider, ..

How can you block them ?

First I configure nginx to returns a 403 HTTP code for the matching HTTP User agent :

As exemple, you will find below an extract of my /etc/nginx/block.conf file :

    ## Block user agents
    set $block_user_agents 0;

    if ($http_user_agent ~ "GrabNet") {
        set $block_user_agents 1;
    }
    if ($http_user_agent ~ "MJ12bot") {
        set $block_user_agents 1;
    }
    if ($http_user_agent ~ "BUbiNG") {
        set $block_user_agents 1;
    }
    if ($http_user_agent ~ "AhrefsBot") {
        set $block_user_agents 1;
    }

    if ($http_user_agent ~ "360Spider" ) {
    	set $block_user_agents 1;
    }

    if ($block_user_agents = 1) {
        return 403;
    }

But they are still trying to scan my web site.

So I have created also fail2ban JAIL rule.

Extract of my /etc/fail2ban/jail.local file :

[nginx-403]
enabled  = true
port     = http,https
filter   = nginx-403
action   = iptables-multiport[name=403, port="http,https"]
logpath  = /var/log/nginx*/*access*.log
maxretry = 10
bantime = 86400

And the corresponding filter : /etc/fail2ban/filter.d/nginx-403.conf :

[Definition]
failregex = ^<HOST>.*\"GET .*HTTP/1.1\" 403 *.
	    ^<HOST>.*\"POST .*HTTP/1.1\" 403 *.
	    ^<HOST>.*\"GET .*HTTP/1.1\" 403 *.

 

I hope it was useful to secure you web site and block these bad bots.

Et Voila,

Nicolas Portais
Author Photographer
http://www.mystockphoto.fr/
http://photos-art.pro/

Ce contenu a été publié dans Anglais, Computer / Technic / Technology, Magento, avec comme mot(s)-clé(s) , , , , , , , , , , , , , , , , . Vous pouvez le mettre en favoris avec ce permalien.

Laisser un commentaire

Votre adresse de messagerie ne sera pas publiée. Les champs obligatoires sont indiqués avec *

Captcha (solve the arithmetic equation) * Time limit is exhausted. Please reload CAPTCHA.