Age | Commit message (Collapse) | Author |
|
A SearXNG maintainer on Matrix reported a traceback::
File "searxng-src/searx/engines/xpath.py", line 272, in response
dom = html.fromstring(resp.text)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "searx-pyenv/lib/python3.11/site-packages/lxml/html/__init__.py", line 850, in fromstring
doc = document_fromstring(html, parser=parser, base_url=base_url, **kw)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "searx-pyenv/lib/python3.11/site-packages/lxml/html/__init__.py", line 738, in document_fromstring
raise etree.ParserError(
lxml.etree.ParserError: Document is empty
I don't have an example to reproduce the issue, but the issue and this patch are
clearly recognizable even without an example.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
|
|
|
|
The use of img_src AND thumbnail in the default results makes no sense (only a
thumbnail is needed). In the current state this is rather confusing, because
img_src is displayed like a thumbnail (small) and thumbnail is displayed like an
image (large).
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
|
|
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
|
|
|
|
We have built up detailed documentation of the *settings* and the *engines* over
the past few years. However, this documentation was still spread over various
chapters and was difficult to navigate in its entirety.
This patch rearranges the Settings & Engines documentation for better
readability.
To review new ordered docs::
make docs.clean docs.live
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
|
|
[PR-3366] https://github.com/searx/searx/pull/3366
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
|
|
no_result_for_http_status contains a list of HTTP status.
These HTTP status are seen an empty result list.
In other cases an exception is thrown as usual.
Previously raise_for_httperror were ignoring all HTTP error,
which make defective engines invisible in the stats.
|
|
Woxikon expects a word in German, so with query "foo" the site finds nothing and
respons a 404:
httpx.HTTPStatusError: Client error '404 Not Found' \
for url 'https://synonyme.woxikon.de/synonyme/foo.php'
[1] https://github.com/searxng/searxng/issues/1543#issuecomment-1193317054
Closes: https://github.com/searxng/searxng/issues/1543
Suggested-by: @allendema [1]
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
|
|
|
|
From GH GUI
|
|
Example:
- engine: xpath
- search_url: example.org
- headers: {'example_header': 'example_header'}
- cookies: {'safesearch': 'off'}
|
|
This patch was generated by black [1]::
make format.python
[1] https://github.com/psf/black
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
|
|
|
|
Suggested-by: @dalf https://github.com/searxng/searxng/issues/102#issuecomment-914168470
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
|
|
androp no longer needed (see line 591 in 7b235a1)::
# pylint: disable=undefined-variable
Suggested-by: @dalf https://github.com/searxng/searxng/issues/102#issuecomment-914068609
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
|
|
Remove the no longer needed `logger = logger.getChild(...)` from engines.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
|
|
Instead of raising an exception and therefore hiding all results of the engine.
It make sense to remove that requirement in order to allow the implementation of
search engines that do not always have a description. In fact some search
engines that in 99% of the case have a description like Brave Search or Mojeek
crash completely if they for some reason included a result with no description.
To test this patch try Mojeek:
!mjk xyz
before and after the patch.
Suggested-by: 0xhtml in https://github.com/searx/searx/discussions/2933
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
|
|
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
|
|
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
|
|
BTW: remove obsolte params['query'] and not needed paging condition.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
|
|
- pylint searx/engines/xpath.py
- fix indentation of some long lines
- add logging
- add doc-strings
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
|
|
Make 'soft_max_redirects' configurable per Xpath engine::
- name : <engine-name>
engine : xpath
soft_max_redirects: 1
...
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
|
|
move meta information from comment to the about variable
so the preferences, the documentation can show these information
|
|
before commit 58d72f2, category was not set in xpath.py,
so searx/engines/__init__py was setting the category to ['general']
the commit 58d72f2 set the category to [] which is not replaced by searx/engines/__init__.py
consequence: the mojeek engine is hidden in the preferences.
this commit revert the xpath.py change.
close #2368
|
|
functions
|
|
This commit is only a step, it doesn't fix all the issues reported by pylint
|
|
Xpath engine and results template changed to account for the fact that
archive.org doesn't cache .onions, though some onion engines migth have
their own cache.
Disabled by default. Can be enabled by setting the SOCKS proxies to
wherever Tor is listening and setting using_tor_proxy as True.
Requires Tor and updating packages.
To avoid manually adding the timeout on each engine, you can set
extra_proxy_timeout to account for Tor's (or whatever proxy used) extra
time.
|
|
|
|
|
|
|
|
compile XPath only once
avoid redundant call to urlparse
get_locale(webapp.py): avoid useless call to request.accept_languages.best_match
|
|
update commit 87baa74a863ac74ae4c86bbfcb04148ba7f70696
|
|
fix google play apps, google play apps, google play music engines
xpath engine: thumbnail_xpath can define an optional thumbnail
|
|
This solves a bug with Yahoo where some results don't specify
a protocol.
|
|
|
|
|
|
|
|
|
|
eg. pageno=1,21,41,... instead of 20,40,60,...
|
|
|
|
|
|
|
|
instead of _ElementStringResult
|
|
|
|
|
|
|
|
|
|
|
|
|