logoalt Hacker News

choilive12/09/20243 repliesview on HN

Perhaps LLMs can solve this somewhat? Not for email summarization - but to intelligently strip away all the HTML fluff and return a plain text version of the contents.


Replies

ninjin12/10/2024

It is a solved problem. Here is a solution that requires something of the order of 1,000,000th of the resources of your proposed idea, no subscription, and runs so fast that you would not even notice it on a machine from 20 years ago:

    > grep text/html ~/.mailcap
    text/html; lynx -width 72 -assume_charset=%{charset} -display_charset=utf-8 -dump %s | sed 's|^   ||'; nametemplate=%s.html; copiousoutput
If you want something more modern:

    text/html; webdump -dli < %s | sed 's/^  //g'; needsterminal; copiousoutput
show 1 reply
abound12/09/2024

FWIW, it's pretty straightforward to extract text from an HTML snippet without LLMs, I'm not actually sure if there's anything they'd do better than a simple HTML parser.

myflash1312/09/2024

Apple Intelligence already does this in the line summary.