Age | Commit message (Collapse) | Author |
|
Closes: https://github.com/searxng/searxng/issues/4127
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
|
|
A SearXNG maintainer on Matrix reported a traceback::
File "searxng-src/searx/engines/xpath.py", line 272, in response
dom = html.fromstring(resp.text)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "searx-pyenv/lib/python3.11/site-packages/lxml/html/__init__.py", line 850, in fromstring
doc = document_fromstring(html, parser=parser, base_url=base_url, **kw)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "searx-pyenv/lib/python3.11/site-packages/lxml/html/__init__.py", line 738, in document_fromstring
raise etree.ParserError(
lxml.etree.ParserError: Document is empty
I don't have an example to reproduce the issue, but the issue and this patch are
clearly recognizable even without an example.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
|
|
|
|
BTW: humanize filesize (Bytes) to KB, MB, GB ..
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
|
|
In some results, Google returns a <script> tag that must be removed before
extracting the content.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
|
|
|
|
The WEB page (PL only) has changed and there is now also a kind of CAPTCHA.
There is currently no possibility to restore the function of this engine.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
|
|
The engines do not have / do not need a property `base_url`, lets remove it from
the settings.yml
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
|
|
The properties `item.service_medium` and `item.thumb_gallery` are not given for
every result item. It is more reliable to use the first (thumb) and
last (image) URL in the list of of URLs in `image_url`.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
|
|
|
|
|
|
The engine has been revised; there is now the option ``adobe_content_types``
with which it is possible to configure engines for video and audio from the
adobe stock. BTW this patch adds documentation to the engine.
To test all three engines in one use a search term like::
!asi !asv !asa sound
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
|
|
|
|
Engine was added in #2733 but the API does no longer exists. Related:
- https://github.com/searxng/searxng/issues/4038
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
|
|
Avoid HTTP 404 and redirects. Requests to the JSON/YAML API use the base url [1]
https://www.loc.gov/{endpoint}/?fo=json
[1] https://www.loc.gov/apis/json-and-yaml/requests/
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
|
|
The query string send to DDG must not be qouted.
The query string was URL-qouted in #4011, but the URL-qouted query string result
in unexpected *URL decoded* and other garbish results as reported in #4019
and #4020. To test compare the results of a query like::
!ddg Häuser und Straßen :de
!ddg Häuser und Straßen :all
!ddg 房屋和街道 :all
!ddg 房屋和街道 :zh
Closed:
- [#4019] https://github.com/searxng/searxng/issues/4019
- [#4020] https://github.com/searxng/searxng/issues/4020
Related:
- [#4011] https://github.com/searxng/searxng/pull/4011
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
|
|
during the revision in PR #3955 the query string was accidentally converted into
a list of words, further the query must be quoted before POSTed in the ``data``
field, see ``urllib.parse.quote_plus`` [1]
[1] https://docs.python.org/3/library/urllib.parse.html#urllib.parse.quote_plus
Closed: #4009
Co-Authored-by: @return42
|
|
|
|
|
|
The entire source code of the duckduckgo engine has been reengineered and
purified.
1. DDG used the URL https://html.duckduckgo.com/html for no-JS requests whose
response is also easier to parse than the previous
https://lite.duckduckgo.com/lite/ URL
2. the bot detection of DDG has so far caused problems and often led to a
CAPTCHA, this can be circumvented using `'Sec-Fetch-Mode'] = “navigate”`
Closes: https://github.com/searxng/searxng/issues/3927
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
|
|
The previous implementation could not distinguish a CAPTCHA response from an
ordinary result list. In the previous implementation a CAPTCHA was taken as a
result list where no items are in.
DDG does not block IPs. Instead, a CAPTCHA wall is placed in front of request
on a dubious request.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
|
|
Bumps [pylint](https://github.com/pylint-dev/pylint) from 3.2.7 to 3.3.1.
- [Release notes](https://github.com/pylint-dev/pylint/releases)
- [Commits](https://github.com/pylint-dev/pylint/compare/v3.2.7...v3.3.1)
---
updated-dependencies:
- dependency-name: pylint
dependency-type: direct:development
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
|
|
|
|
Improve region and language detection / all locale
Testing has shown the following behaviour for the different
default and empty values of Mojeeks parameters:
| param | idx | value | behaviour |
| -------- | --- | ------ | ------------------------- |
| region | 0 | '' | detect region based on IP |
| region | 1 | 'none' | all regions |
| language | 0 | '' | all languages |
|
|
Without this patch the Gitea Search Engine is only partially compatible with
modern gitea or forgejo:
- Fixing some JSON Fields
- Using Repository Avatar when Available
To Verify My results you can look at the Modern API doc and results, its
available on all Gitea and Forgejo instance by Default. Heres an Search API
result of Mine:
- https://git.euph.dev/api/v1/repos/search?q=ccna
|
|
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
|
|
|
|
|
|
|
|
|
|
add Cloudflare AI Gateway engine
add settings for Cloudflare AI Gateway engine
set utf8 encode for data, fix non english char cause 500 error
format json data
fixed indentation and config format error
fix line-length limitation in CI
reformatted code for CI
reformatted code for CI
limit system prompts to less 120 chars
cleanup unused variable & format code
|
|
Removes ``/>`` ending tags for void elements [1] and replaces them with ``>``.
Part of the larger cleanup to cleanup invalid HTML throughout the codebase [2].
[1] https://html.spec.whatwg.org/multipage/syntax.html#void-elements
[2] https://github.com/searxng/searxng/issues/3793
|
|
So far a CAPTCHA was not recognized in the response of the qwant engine and a
SearxEngineAPIException was raised by mistake. With this patch a CAPTCHA
redirect is recognized and the correct SearxEngineCaptchaException is raised.
Closes: https://github.com/searxng/searxng/issues/3806
Signed-off-by: Markus <markus@venom.fritz.box>
|
|
This patch fixes a bug reported by CI "Fetch traits" [1] (brave) and improves
other fetch traits functions (google, annas_archive & radio_browser).
brave:
File "/home/runner/work/searxng/searxng/searx/engines/brave.py", line 434, in fetch_traits
sxng_tag = region_tag(babel.Locale.parse(ui_lang, sep='-'))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/runner/work/searxng/searxng/searx/locales.py", line 155, in region_tag
Error: raise ValueError('%s missed a territory')
google:
change ERROR message about unknow UI language to INFO message
radio_browser:
country_list contains duplicates that differ only in upper/lower case
annas_archive:
for better diff; sort the persistence of the traits
[1] https://github.com/searxng/searxng/actions/runs/10606312371/job/29433352518#step:6:41
Signed-off-by: Markus <markus@venom.fritz.box>
|
|
Co-authored-by: Markus Heiser <markus.heiser@darmarit.de>
|
|
Fixes #3810.
|
|
Yep includes links to search for the same query on Google and other
search engines as a result in the search result. This fix skips these
results.
|
|
- ValueError in duration: issue reported in #3799
- HTML in title: related to #3770
[#3799] https://github.com/searxng/searxng/issues/3799
[#3770] https://github.com/searxng/searxng/pull/3770
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
|
|
- https://github.com/searxng/searxng/issues/3790
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
|
|
We do not want to show the user-agent information from the duckduckgo
zero click info. This is the user-agent used by searxng and not the
user-agent used by the user.
This was already done for the IP address in:
0fb3f0e4aeecf62612cb6568910cf0f97c98cab9
|
|
It's set to inactive in settings.yml because of CAPTCHA. You need to remove
that from the settings.yml to get in use.
Closes: https://github.com/searxng/searxng/issues/961
|
|
Changes made to tineye engine:
1. Importing logging if TYPE_CHECKING is enabled
2. Remove unecessary try-catch around json parsing the response, as this
masked the original error and had no immediate benefit
3. Improve error handling explicitely for status code 422 and 400
upfront, deferring json_parsing only for these status codes and
successful status codes
4. Unit test all new applicable changes to ensure compatability
|
|
|
|
Google underlines words inside of answers that can be clicked to show
additional definitions. These definitions inside the answer were not
correctly handled and ended up in the middle of the answer text. With
this fix, the extra definitions are stripped from the answer shown by
the frontend.
|
|
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
|
|
Fault pattern: if there are no offers, then an exception has been thrown:
IndexError: list index out of range
This patch makes the addition of “best price” dependent on whether one exists.
Closes: #3685
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
|
|
|
|
|
|
|
|
https://search.lomig.me
Poor results / tested `!yacy :en hello` and got zero results
https://yacy.ecosys.eu
Slow response (> 6sec for trivial search terms)
https://search.webproject.link
Dead instance / URL offline
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
|