logoalt Hacker News

andrew_zhongtoday at 8:45 AM0 repliesview on HN

HTML -> markdown -> LLM is standard practice. We strip elements like aside, embed, head , iframe etc. the criteria is conservatively set to avoid removing too many elements (especially in extractMain mode)

https://github.com/lightfeed/extractor/blob/main/src/convert...

I have used gemma 3 and had good results.

Once Gemini 3 flash drops the preview suffix, will update the examples. Thank you for the pointer.