summaryrefslogtreecommitdiff
path: root/docs/admin/filtron.rst
blob: 009ca1c29f5551895bb4887f3186c93a4a61da5d (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
==========================
How to protect an instance
==========================

.. _filtron: https://github.com/asciimoo/filtron

Searx depens on external search services.  To avoid the abuse of these services
it is advised to limit the number of requests processed by searx.

An application firewall, filtron_ solves exactly this problem.  Filtron is just
a middleware between your web server (nginx, apache, ...) and searx.


filtron & go
============

.. _Go: https://golang.org/
.. _filtron README: https://github.com/asciimoo/filtron/blob/master/README.md

Filtron needs Go_ installed.  If Go_ is preinstalled, filtron_ is simply
installed by ``go get`` package management (see `filtron README`_).  If you use
filtron as middleware, a more isolated setup is recommended.

#. Create a separated user account (``filtron``).
#. Download and install Go_ binary in users $HOME (``~filtron``).
#. Install filtron with the package management of Go_ (``go get -v -u
   github.com/asciimoo/filtron``)
#. Setup a proper rule configuration :origin:`[ref]
   <utils/templates/etc/filtron/rules.json>` (``/etc/filtron/rules.json``).
#. Setup a systemd service unit :origin:`[ref]
   <utils/templates/lib/systemd/system/filtron.service>`
   (``/lib/systemd/system/filtron.service``).

To simplify such a installation and the maintenance of; use our script
``utils/filtron.sh``:

.. program-output:: ../utils/filtron.sh --help
   :ellipsis: 0,5


Sample configuration of filtron
===============================

An example configuration can be find below. This configuration limits the access
of:

- scripts or applications (roboagent limit)
- webcrawlers (botlimit)
- IPs which send too many requests (IP limit)
- too many json, csv, etc. requests (rss/json limit)
- the same UserAgent of if too many requests (useragent limit)

.. code:: json

   [{
      "name":"search request",
      "filters":[
         "Param:q",
         "Path=^(/|/search)$"
      ],
      "interval":"<time-interval-in-sec (int)>",
      "limit":"<max-request-number-in-interval (int)>",
      "subrules":[
         {
            "name":"roboagent limit",
            "interval":"<time-interval-in-sec (int)>",
            "limit":"<max-request-number-in-interval (int)>",
            "filters":[
               "Header:User-Agent=(curl|cURL|Wget|python-requests|Scrapy|FeedFetcher|Go-http-client)"
            ],
            "actions":[
               {
                  "name":"block",
                  "params":{
                     "message":"Rate limit exceeded"
                  }
               }
            ]
         },
         {
            "name":"botlimit",
            "limit":0,
            "stop":true,
            "filters":[
               "Header:User-Agent=(Googlebot|bingbot|Baiduspider|yacybot|YandexMobileBot|YandexBot|Yahoo! Slurp|MJ12bot|AhrefsBot|archive.org_bot|msnbot|MJ12bot|SeznamBot|linkdexbot|Netvibes|SMTBot|zgrab|James BOT)"
            ],
            "actions":[
               {
                  "name":"block",
                  "params":{
                     "message":"Rate limit exceeded"
                  }
               }
            ]
         },
         {
            "name":"IP limit",
            "interval":"<time-interval-in-sec (int)>",
            "limit":"<max-request-number-in-interval (int)>",
            "stop":true,
            "aggregations":[
               "Header:X-Forwarded-For"
            ],
            "actions":[
               {
                  "name":"block",
                  "params":{
                     "message":"Rate limit exceeded"
                  }
               }
            ]
         },
         {
            "name":"rss/json limit",
            "interval":"<time-interval-in-sec (int)>",
            "limit":"<max-request-number-in-interval (int)>",
            "stop":true,
            "filters":[
               "Param:format=(csv|json|rss)"
            ],
            "actions":[
               {
                  "name":"block",
                  "params":{
                     "message":"Rate limit exceeded"
                  }
               }
            ]
         },
         {
            "name":"useragent limit",
            "interval":"<time-interval-in-sec (int)>",
            "limit":"<max-request-number-in-interval (int)>",
            "aggregations":[
               "Header:User-Agent"
            ],
            "actions":[
               {
                  "name":"block",
                  "params":{
                     "message":"Rate limit exceeded"
                  }
               }
            ]
         }
      ]
   }]



Route request through filtron
=============================

Filtron can be started using the following command:

.. code:: sh

   $ filtron -rules rules.json

It listens on ``127.0.0.1:4004`` and forwards filtered requests to
``127.0.0.1:8888`` by default.

Use it along with ``nginx`` with the following example configuration.

.. code:: nginx

   location / {
        proxy_set_header   Host    $http_host;
        proxy_set_header   X-Real-IP $remote_addr;
        proxy_set_header   X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header   X-Scheme $scheme;
        proxy_pass         http://127.0.0.1:4004/;
   }

Requests are coming from port 4004 going through filtron and then forwarded to
port 8888 where a searx is being run.