summaryrefslogtreecommitdiff
path: root/searx/engines/xpath.py
AgeCommit message (Collapse)Author
2024-11-29[mod] hardening xpath engine: ignore empty resultsMarkus Heiser
A SearXNG maintainer on Matrix reported a traceback:: File "searxng-src/searx/engines/xpath.py", line 272, in response dom = html.fromstring(resp.text) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "searx-pyenv/lib/python3.11/site-packages/lxml/html/__init__.py", line 850, in fromstring doc = document_fromstring(html, parser=parser, base_url=base_url, **kw) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "searx-pyenv/lib/python3.11/site-packages/lxml/html/__init__.py", line 738, in document_fromstring raise etree.ParserError( lxml.etree.ParserError: Document is empty I don't have an example to reproduce the issue, but the issue and this patch are clearly recognizable even without an example. Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2024-11-28[feat] json/xpath engine: config option for method and bodyBnyro
2024-05-16[mod] simple theme: drop img_src from default resultsMarkus Heiser
The use of img_src AND thumbnail in the default results makes no sense (only a thumbnail is needed). In the current state this is rather confusing, because img_src is displayed like a thumbnail (small) and thumbnail is displayed like an image (large). Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2024-03-11[mod] pylint all engines without PYLINT_SEARXNG_DISABLE_OPTIONMarkus Heiser
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2023-09-18[fix] spellingjazzzooo
2023-07-01[doc] rearranges Settings & Engines docs for better readabilityMarkus Heiser
We have built up detailed documentation of the *settings* and the *engines* over the past few years. However, this documentation was still spread over various chapters and was difficult to navigate in its entirety. This patch rearranges the Settings & Engines documentation for better readability. To review new ordered docs:: make docs.clean docs.live Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-09-27[fix] typos / reported by @kianmeng in searx PR-3366Markus Heiser
[PR-3366] https://github.com/searx/searx/pull/3366 Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-09-04xpath engine: change raise_for_httperror to no_result_for_http_statusAlexandre FLAMENT
no_result_for_http_status contains a list of HTTP status. These HTTP status are seen an empty result list. In other cases an exception is thrown as usual. Previously raise_for_httperror were ignoring all HTTP error, which make defective engines invisible in the stats.
2022-09-04[fix] engine woxikon.de - don't raise exception on empty result listMarkus Heiser
Woxikon expects a word in German, so with query "foo" the site finds nothing and respons a 404: httpx.HTTPStatusError: Client error '404 Not Found' \ for url 'https://synonyme.woxikon.de/synonyme/foo.php' [1] https://github.com/searxng/searxng/issues/1543#issuecomment-1193317054 Closes: https://github.com/searxng/searxng/issues/1543 Suggested-by: @allendema [1] Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2022-04-17[fix[ Update only cookies/headersAllen
2022-04-17[lint] Remove whitespaceAllen
From GH GUI
2022-04-16[enh] Allow passing headers/cookies from settings.ymlAllen
Example: - engine: xpath - search_url: example.org - headers: {'example_header': 'example_header'} - cookies: {'safesearch': 'off'}
2021-12-27[format.python] initial formatting of the python codeMarkus Heiser
This patch was generated by black [1]:: make format.python [1] https://github.com/psf/black Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-09-11[mod] xpath engine: remove logging of the requested URLAlexandre Flament
2021-09-07[pylint] engines: drop no longer needed 'missing-function-docstring'Markus Heiser
Suggested-by: @dalf https://github.com/searxng/searxng/issues/102#issuecomment-914168470 Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-09-07[fix] add 'categories' to PYLINT_ADDITIONAL_BUILTINS_FOR_ENGINESMarkus Heiser
androp no longer needed (see line 591 in 7b235a1):: # pylint: disable=undefined-variable Suggested-by: @dalf https://github.com/searxng/searxng/issues/102#issuecomment-914068609 Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-09-06[mod] one logger per engine - drop obsolete logger.getChildMarkus Heiser
Remove the no longer needed `logger = logger.getChild(...)` from engines. Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-09-04[fix] remove minimum length of content for XPath engineMarkus Heiser
Instead of raising an exception and therefore hiding all results of the engine. It make sense to remove that requirement in order to allow the implementation of search engines that do not always have a description. In fact some search engines that in 99% of the case have a description like Brave Search or Mojeek crash completely if they for some reason included a result with no description. To test this patch try Mojeek: !mjk xyz before and after the patch. Suggested-by: 0xhtml in https://github.com/searx/searx/discussions/2933 Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-05-23[enh] XPath engine - add time safe-search supportMarkus Heiser
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-05-23[enh] XPath engine - add time range supportMarkus Heiser
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-05-23[enh] XPath engine - add ISO 639-1 {lang} replacement to search-URLMarkus Heiser
BTW: remove obsolte params['query'] and not needed paging condition. Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-05-23[doc] add documentation about the XPath engineMarkus Heiser
- pylint searx/engines/xpath.py - fix indentation of some long lines - add logging - add doc-strings Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-05-17[enh] xpath engine - add request parameter 'soft_max_redirects'Markus Heiser
Make 'soft_max_redirects' configurable per Xpath engine:: - name : <engine-name> engine : xpath soft_max_redirects: 1 ... Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-14[enh] engines: add about variableAlexandre Flament
move meta information from comment to the about variable so the preferences, the documentation can show these information
2020-12-10[fix] xpath, mojeek: fix commit 58d72f26925d56e22330c54be03c3dcbee0c4135Alexandre Flament
before commit 58d72f2, category was not set in xpath.py, so searx/engines/__init__py was setting the category to ['general'] the commit 58d72f2 set the category to [] which is not replaced by searx/engines/__init__.py consequence: the mojeek engine is hidden in the preferences. this commit revert the xpath.py change. close #2368
2020-12-03[mod] xpath, 1337x, acgsou, apkmirror, archlinux, arxiv: use eval_xpath_* ↵Alexandre Flament
functions
2020-11-03[mod] pylint: minor code change to allow pylint globallyAlexandre Flament
This commit is only a step, it doesn't fix all the issues reported by pylint
2020-10-25[enh] Add onions category with Ahmia, Not Evil and Torcha01200356
Xpath engine and results template changed to account for the fact that archive.org doesn't cache .onions, though some onion engines migth have their own cache. Disabled by default. Can be enabled by setting the SOCKS proxies to wherever Tor is listening and setting using_tor_proxy as True. Requires Tor and updating packages. To avoid manually adding the timeout on each engine, you can set extra_proxy_timeout to account for Tor's (or whatever proxy used) extra time.
2020-10-02[mod] move extract_text, extract_url to searx.utilsAlexandre Flament
2020-09-10Drop Python 2 (1/n): remove unicode string and url_utilsDalf
2020-07-23Fix relative urls that do not start with '/'xywei
2019-11-15[mod] speed optimizationDalf
compile XPath only once avoid redundant call to urlparse get_locale(webapp.py): avoid useless call to request.accept_languages.best_match
2019-07-25[fix] fixes google play engines (#1651)Alexandre Flament
update commit 87baa74a863ac74ae4c86bbfcb04148ba7f70696
2019-07-25[fix] fixes google play engines and adds thumbnails to their results (#1612)Venca24
fix google play apps, google play apps, google play music engines xpath engine: thumbnail_xpath can define an optional thumbnail
2018-04-08[fix] append http if no scheme is provided in xpath's extact_urlMarc Abonce Seguin
This solves a bug with Yahoo where some results don't specify a protocol.
2017-05-22[fix] produce valid urls if scheme is missingAdam Tauber
2017-05-15[enh] py3 compatibilityAdam Tauber
2017-01-17[fix] allow empty contentDavid A Roberts
2016-12-31[fix] extract_text: use html.tostring instead html_to_text. Fix #711Alexandre Flament
2016-08-14[fix] behaviour for page_size>1 and first_page_num>0David A Roberts
eg. pageno=1,21,41,... instead of 20,40,60,...
2016-03-28Add paging support to XPath & Erowid enginesKirill Isakov
2016-01-18[fix] pep8 compatibiltyAdam Tauber
2015-01-25Sanitize extract_textCqoicebordel
2014-03-04[fix] error when xpath_results in extraxt_text is _ElementUnicodeResult ↵potato
instead of _ElementStringResult
2014-02-11[mod] len() removed from conditionsasciimoo
2014-01-30[fix] function parametersasciimoo
2014-01-30[fix] function parametersasciimoo
2014-01-30[enh] importable url extractorasciimoo
2014-01-23[fix] html tag removalasciimoo
2014-01-20[fix] pep/flake8 compatibilityasciimoo