gtag('config', 'G-0PFHD683JR');
Price Prediction

Robots love to waste your server resources – here how to fight again

If you manage a general web server, you will notice that the robots try to be lucky to countless pages. Take advantage of HaProxy to prevent them at the door.

Note: The links in this article include links to Amazon. Nothing will be imposed to click on these links, but you will support me if you choose to buy something through one of these links. Thanks!

Your records are filled with 404 visits /.DS_Storefor /backup.sqlfor /.vscode/sftp.json And a large number of other URL addresses. Although these requests are often harmless, unless your server has what it already offers in those sites, you must put robots.

Why?

The server beating the task of a dense resource, and given that these robots have a wide list of different URLs, there is no temporary storage mechanism that can help you. Moreover, stopping robots is always a safety.

We have previously used HaProxy to relieve attacks on the login page to WordPress, the idea is to extend this approach to cover 404 errors.

Robots will make every effort to create chaos in your serverRobots will make every effort to create chaos in your server

Robots will make every effort to create chaos in your server

I was inspired by SASA TeKovic, that is, not prohibiting the crawl of the actual search engine and allowing 404 with fixed resources to prevent actual missing resources – a mistake on your part – from not preventing legal users.

Before implementation, it is always good to rotate a local test environment. Let’s start HAProxy and Apache Use Docker. We need an actual back server to give us these 404.

version : '3'

services:
    haproxy:
        image: haproxy:3.1.3-alpine
        ports:
            - "8100:80"
        volumes:
            - "./haproxy.cfg:/usr/local/etc/haproxy/haproxy.cfg"
        networks:
            - webnet
    apache:
        image: httpd:latest
        container_name: apache1
        ports:
            - "8080:80"
        volumes:
            - ./html:/usr/local/apache2/htdocs/
        networks:
            - webnet
            
networks:
    webnet:

Then, simply running docker-compose upAnd you can reach localhost:8100 In your browser.

the haproxy.cfg The file is largely large:

global
    log stdout format raw daemon debug
	
defaults
    log     global
	mode    http

frontend main
    bind *:80
	
	acl static_file path_end .css .js .jpg .jpeg .gif .ico .png .bmp .webp .csv .ttf .woff .svg .svgz
    acl excluded_user_agent hdr_reg(user-agent) -i (yahoo|yandex|kagi|(google|bing)bot)

    # tracks IPs but exclude hits on static files and search engine crawlers
    http-request track-sc0 src table mock_404_tracking if !static_file !excluded_user_agent
    # increment gpc0 if response code was 404
    http-response sc-inc-gpc0(0) if { status 404 }
   	# checks if the 404 error rate limit was exceeded
    http-request deny deny_status 403 content-type text/html lf-string "404 abuse" if { sc0_gpc0_rate(mock_404_tracking) ge 5 }

    # whatever backend you're using
    use_backend apache_servers

backend apache_servers
    server apache1 apache1:80 maxconn 32

# mock backend to hold a stick table
backend mock_404_tracking
    stick-table type ip size 100k expire 10m store gpc0,gpc0_rate(1m)

If you get more than 5 times 404 requests in one minute, the robot will be banned for 10 minutes.


As, this setting effectively evaluates robots that generate excessive 404s. However, we also want to integrate it with our previous example, where we used HAProxy To prevent attacks on WordPress.

global
    log stdout format raw daemon debug
	
defaults
    log     global
	mode    http

frontend main
    bind *:80
	
	# We may, or may not, be running this with Cloudflare acting as a CDN.
    # If Cloudflare is in front of our servers, user/bot IP will be in 
    # 'CF-Connecting-IP', otherwise user IP with be in 'src'. So we make
    # sure to set a variable 'txn.actual_ip' that has the IP, no matter what
    http-request set-var(txn.actual_ip) hdr_ip(CF-Connecting-IP) if { hdr(CF-Connecting-IP) -m found }
    http-request set-var(txn.actual_ip) src if !{ hdr(CF-Connecting-IP) -m found }

    # gets the actual IP on logs
    log-format "%ci\ %hr\ %ft\ %b/%s\ %Tw/%Tc/%Tt\ %B\ %ts\ %r\ %ST\ %Tr IP:%{+Q}[var(txn.actual_ip)]"

    # common static files where we may get 404 errors and also common search engine
    # crawlers that we don't want blocked
	acl static_file path_end .css .js .jpg .jpeg .gif .ico .png .bmp .webp .csv .ttf .woff .svg .svgz
    acl excluded_user_agent hdr_reg(user-agent) -i (yahoo|yandex|kagi|google|bing)

    # paths where we will rate limit users to prevent Wordpress abuse
	acl is_wp_login path_end -i /wp-login.php /xmlrpc.php /xmrlpc.php
    acl is_post method POST

	# 404 abuse blocker
    # track IPs but exclude hits on static files and search engine crawlers
    # increment gpc0 counter if response status was 404 and deny if rate exceeded
    http-request track-sc0 var(txn.actual_ip) table mock_404_track if !static_file !excluded_user_agent
    http-response sc-inc-gpc0(0) if { status 404 }
    http-request deny deny_status 403 content-type text/html lf-string "404 abuse" if { sc0_gpc0_rate(mock_404_track) ge 5 }

    # wordpress abuse blocker
    # track IPs if the request hits one of the monitored paths with a POST request
    # increment gpc1 counter if path was hit and deny if rate exceeded
    http-request track-sc1 var(txn.actual_ip) table mock_wplogin_track if is_wp_login is_post
    http-request sc-inc-gpc1(1) if is_wp_login is_post 
    http-request deny deny_status 403 content-type text/html lf-string "login abuse" if { sc1_gpc1_rate(mock_wplogin_track) ge 5 }
	
    # your backend, here using apache for demonstration purposes
    use_backend apache_servers

backend apache_servers
    server apache1 apache1:80 maxconn 32

# mock backends for storing sticky tables
backend mock_404_track
    stick-table type ip size 100k expire 10m store gpc0,gpc0_rate(1m)
backend mock_wplogin_track
    stick-table type ip size 100k expire 10m store gpc1,gpc1_rate(1m)

Running with two stick tablesAnd stop both threats.

And there you have. HaProxy is used again for much more as a simple reverse agent. It is a small Swiss knife!


This headlights were a game change when working on repairs.

I had one, but when I broke out, I hesitated to replace it, and resort to the lamp. Certainly, it works – but once you try the comfort of both hands again – there is no return. If you need reliable and hands -free lighting, this is a must!

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button