I built an MCP server that gives any local LLM real Google search and now vision capabilities - no API keys needed.
The latest feature: google_lens_detect uses OpenCV to find objects in an image, crops each one, and sends them to Google Lens for identification. GPT-OSS-120B, a text-only model with
zero vision support, correctly identified an NVIDIA DGX Spark and a SanDisk USB drive from a desk photo.
Also includes Google Search, News, Shopping, Scholar, Maps, Finance, Weather, Flights, Hotels, Translate, Images, Trends, and more. 17 tools total.
Two commands: pip install noapi-google-search-mcp && playwright install chromium
GitHub: https://github.com/VincentKaufmann/noapi-google-search-mcp
PyPI: https://pypi.org/project/noapi-google-search-mcp/
Booyah!Looks like a TOS violation to me to scrape google directly like that. While the concept of giving a text only model 'pseudo vision' is clever, I think the solution in its current form is a bit fragile. The SerpAPI, Google Custom Search API, etc. exist for a reason; For anything beyond personal tinkering, this is a liability.
you eventually get hit with captcha with the playwright approach
have you tried Llama? In my experience it has been strictly better than GPT OSS, but it might depend on specifically how it is used.
> GPT-OSS-120B, a text-only model with zero vision support, correctly identified an NVIDIA DGX Spark and a SanDisk USB drive from a desk photo.
But wasn't it Google Lens that actually identified them?