diff options
Diffstat (limited to 'docs/admin/filtron.rst')
-rw-r--r-- | docs/admin/filtron.rst | 148 |
1 files changed, 148 insertions, 0 deletions
diff --git a/docs/admin/filtron.rst b/docs/admin/filtron.rst new file mode 100644 index 000000000..07dcb9bc5 --- /dev/null +++ b/docs/admin/filtron.rst @@ -0,0 +1,148 @@ +========================== +How to protect an instance +========================== + +Searx depens on external search services. To avoid the abuse of these services +it is advised to limit the number of requests processed by searx. + +An application firewall, ``filtron`` solves exactly this problem. Information +on how to install it can be found at the `project page of filtron +<https://github.com/asciimoo/filtron>`__. + + +Sample configuration of filtron +=============================== + +An example configuration can be find below. This configuration limits the access +of: + +- scripts or applications (roboagent limit) +- webcrawlers (botlimit) +- IPs which send too many requests (IP limit) +- too many json, csv, etc. requests (rss/json limit) +- the same UserAgent of if too many requests (useragent limit) + +.. code:: json + + [{ + "name":"search request", + "filters":[ + "Param:q", + "Path=^(/|/search)$" + ], + "interval":"<time-interval-in-sec (int)>", + "limit":"<max-request-number-in-interval (int)>", + "subrules":[ + { + "name":"roboagent limit", + "interval":"<time-interval-in-sec (int)>", + "limit":"<max-request-number-in-interval (int)>", + "filters":[ + "Header:User-Agent=(curl|cURL|Wget|python-requests|Scrapy|FeedFetcher|Go-http-client)" + ], + "actions":[ + { + "name":"block", + "params":{ + "message":"Rate limit exceeded" + } + } + ] + }, + { + "name":"botlimit", + "limit":0, + "stop":true, + "filters":[ + "Header:User-Agent=(Googlebot|bingbot|Baiduspider|yacybot|YandexMobileBot|YandexBot|Yahoo! Slurp|MJ12bot|AhrefsBot|archive.org_bot|msnbot|MJ12bot|SeznamBot|linkdexbot|Netvibes|SMTBot|zgrab|James BOT)" + ], + "actions":[ + { + "name":"block", + "params":{ + "message":"Rate limit exceeded" + } + } + ] + }, + { + "name":"IP limit", + "interval":"<time-interval-in-sec (int)>", + "limit":"<max-request-number-in-interval (int)>", + "stop":true, + "aggregations":[ + "Header:X-Forwarded-For" + ], + "actions":[ + { + "name":"block", + "params":{ + "message":"Rate limit exceeded" + } + } + ] + }, + { + "name":"rss/json limit", + "interval":"<time-interval-in-sec (int)>", + "limit":"<max-request-number-in-interval (int)>", + "stop":true, + "filters":[ + "Param:format=(csv|json|rss)" + ], + "actions":[ + { + "name":"block", + "params":{ + "message":"Rate limit exceeded" + } + } + ] + }, + { + "name":"useragent limit", + "interval":"<time-interval-in-sec (int)>", + "limit":"<max-request-number-in-interval (int)>", + "aggregations":[ + "Header:User-Agent" + ], + "actions":[ + { + "name":"block", + "params":{ + "message":"Rate limit exceeded" + } + } + ] + } + ] + }] + + + +Route request through filtron +============================= + +Filtron can be started using the following command: + +.. code:: sh + + $ filtron -rules rules.json + +It listens on ``127.0.0.1:4004`` and forwards filtered requests to +``127.0.0.1:8888`` by default. + +Use it along with ``nginx`` with the following example configuration. + +.. code:: nginx + + location / { + proxy_set_header Host $http_host; + proxy_set_header X-Real-IP $remote_addr; + proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; + proxy_set_header X-Scheme $scheme; + proxy_pass http://127.0.0.1:4004/; + } + +Requests are coming from port 4004 going through filtron and then forwarded to +port 8888 where a searx is being run. |