Age | Commit message (Collapse) | Author |
|
The 'scrap_img_by_id' function didn't return any longer anything useful. This
fix allows the google images engine to present the full source image instead of
only the thumbnail.
The function scrap_img_by_id() is rpelaced by a fully rewrite to parse image
URLs by a regular expression. The new function parse_urls_img_from_js(dom)
returns a mapping of data-id to image URL.
Closes: https://github.com/searxng/searxng/issues/909
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
|
|
|
|
This patch was generated by black [1]::
make format.python
[1] https://github.com/psf/black
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
|
|
Sometimes there is no href in the `<a ..>` tag of a *link_node* [1].
[1] https://github.com/searxng/searxng/issues/532
Reported-by: @TheEssem
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
|
|
Since 7b235a1 (see line 591) it is no longer needed to disable
'undefined-variable' for names defined in::
PYLINT_ADDITIONAL_BUILTINS_FOR_ENGINES
Suggested-by: @dalf https://github.com/searxng/searxng/issues/102#issuecomment-914068609
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
|
|
Remove the no longer needed `logger = logger.getChild(...)` from engines.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
|
|
Pylint 2.10 added new default checks [1]:
use-list-literal
Emitted when list() is called with no arguments instead of using []
use-dict-literal
Emitted when dict() is called with no arguments instead of using {}
[1] https://pylint.pycqa.org/en/latest/whatsnew/2.10.html
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
|
|
Proposed fix in https://github.com/searx/searx/pull/2115#issuecomment-876716010
|
|
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
|
|
- HTTP header Accept-Language --> lang_info['headers']['Accept-Language']
- remove obsolete query_url log messages which is already logged by
httpx._client:HTTP request
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
|
|
Same behaviour behaviour than Whoogle [1]. Only the google engine with the
"Default language" choice "(all)"" is changed by this patch.
When searching for a locate place, the result are in the expect language,
without missing results [2]:
> When a language is not specified, the language interpretation is left up to
> Google to decide how the search results should be delivered.
The query parameters are copied from Whoogle. With the ``all`` language:
- add parameter ``source=lnt``
- don't use parameter ``lr``
- don't add a ``Accept-Language`` HTTP header.
The new signature of function ``get_lang_info()`` is:
lang_info = get_lang_info(params, lang_list, custom_aliases, supported_any_language)
Argument ``supported_any_language`` is True for google.py and False for the other
google engines. With this patch the function now returns:
- query parameters: ``lang_info['params']``
- HTTP headers: ``lang_info['headers']``
- and as before this patch:
- ``lang_info['subdomain']``
- ``lang_info['country']``
- ``lang_info['language']``
[1] https://github.com/benbusby/whoogle-search
[2] https://github.com/benbusby/whoogle-search/releases/tag/v0.5.4
|
|
These py files are linted by `test.pylint`, all other files are linted by
`test.pep8`.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
|
|
The language_support variable is set to True by default,
and set to False in only 5 engines.
Except the documentation and the /config URL, this variable is not used.
This commit remove the variable definition in the engines, and
set value according to supported_languages length: False when the length is 0,
True otherwise.
Close #2485
|
|
BTW: make the engines ready for search.checker:
- replace eval_xpath by eval_xpath_getindex and eval_xpath_list
- google_images: remove outer try/except block
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
|
|
This revise is based on the methods developed in the revise of the google engine
(see commit 410c2f9).
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
|
|
move meta information from comment to the about variable
so the preferences, the documentation can show these information
|
|
Engine list: ahmia, duckduckgo_images, elasticsearch, google, google_images, google_videos, youtube_api
|
|
|
|
use
from searx.engines.duckduckgo import _fetch_supported_languages, supported_languages_url # NOQA
so it is possible to easily remove all unused import using autoflake:
autoflake --in-place --recursive --remove-all-unused-imports searx tests
|
|
|
|
|
|
Closes #2103
|
|
|
|
this commit is picked from #1985
|
|
|
|
google_images : use JSON embedded in HTML (engine expected pure JSON)
|
|
- Because there is not full image url in the dom, we replace "image_url" with the same url as the "url" (url of source).
See example HTML https://gist.github.com/Nachtalb/2dea8a4d2c723c49226ad9645838121f
- Remove unused import
- Fix google image search title
- Keep google image safe value up to date
|
|
|
|
|
|
|
|
Engines:
* Bing images
* Flickr (noapi)
* Google
* Google Images
* Google News
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
update versions.cfg to use the current up-to-date packages
|
|
|
|
engines that still use http : gigablast, bing image for thumbnails, 1x and dbpedia autocompleter
|
|
|
|
|
|
|
|
|
|
- Modify engines to create/fetch an URL for the thumbnails
- Modify themes to show thumbnails instead of full images.
In Courgette, the result is not very beautiful. Should we change it ?
|
|
|
|
It seems like Google image is doing a double urlencode on the url of the images. So we need to unquote once before sending to the browser the urls.
It solves the 404 we could see with some image with specials chars in url.
Exemple https://searx.laquadrature.net/?q=etes&pageno=1&category_images (there are two of those in the list)
|
|
|