logoalt Hacker News

Show HN: I taught GPT-OSS-120B to see using Google Lens and OpenCV

19 pointsby vkaufmanntoday at 5:40 AM11 commentsview on HN

I built an MCP server that gives any local LLM real Google search and now vision capabilities - no API keys needed.

  The latest feature: google_lens_detect uses OpenCV to find objects in an image, crops each one, and sends them to Google Lens for identification. GPT-OSS-120B, a text-only model with
   zero vision support, correctly identified an NVIDIA DGX Spark and a SanDisk USB drive from a desk photo.

  Also includes Google Search, News, Shopping, Scholar, Maps, Finance, Weather, Flights, Hotels, Translate, Images, Trends, and more. 17 tools total.

  Two commands: pip install noapi-google-search-mcp && playwright install chromium

  GitHub: https://github.com/VincentKaufmann/noapi-google-search-mcp
  PyPI: https://pypi.org/project/noapi-google-search-mcp/

Booyah!

Comments

magic_hamstertoday at 9:00 AM

> GPT-OSS-120B, a text-only model with zero vision support, correctly identified an NVIDIA DGX Spark and a SanDisk USB drive from a desk photo.

But wasn't it Google Lens that actually identified them?

N_Lenstoday at 7:15 AM

Looks like a TOS violation to me to scrape google directly like that. While the concept of giving a text only model 'pseudo vision' is clever, I think the solution in its current form is a bit fragile. The SerpAPI, Google Custom Search API, etc. exist for a reason; For anything beyond personal tinkering, this is a liability.

show 2 replies
tanduvtoday at 7:47 AM

you eventually get hit with captcha with the playwright approach

TZubiritoday at 7:10 AM

have you tried Llama? In my experience it has been strictly better than GPT OSS, but it might depend on specifically how it is used.

show 1 reply