I've been making skills from arxiv papers for a while. I have a one for multi-object tracking f...

simlevesque • today at 7:18 PM • 5 replies • view on HN

I've been making skills from arxiv papers for a while. I have a one for multi-object tracking for example. It has a SKILL.md describing all important papers (over 30) on the subject and a folder with each paper's full content as reStructuredText.

To feed Arxiv papers to LLMs I found that RST gives the best token count/fidelity ratio. Markdown lacks precision. LateX is too verbose. I have a script with the paper's urls, name and date that downloads the LateX zips from Arxiv, extracts it, transforms them to RST and then adds them to the right folder. Then I ask a LLM to make a summary from the full text, then I give other LLMs the full paper again with the summary and ask them to improve on and and proofread them. While this goes on I read the papers myself and at the end I read the summaries and if I approve them I add it to the skill. I also add for each paper info on how well the algorithms described do in common benchmarks.

I highly recommend doing something similar if you're working in a cutting-edge domain. Also I'd like to know if anyone has recommendations to improve what I do.

Replies

paulluuk • today at 7:34 PM

This sounds like it would work, but honestly if you've already read all 30 papers fully, what do you still need to llm to do for you? Just the boilerplate?

➕ show 1 reply

ctoth • today at 7:43 PM

I've been working on ctoth/research-papers-plugin, the pipeline to actually get LLMs to extract the notes. I really like your insight re RST over Markdown! It sounds like we're working on similar stuff and I'll absolutely reach out :)

➕ show 1 reply

satvikpendem • today at 8:06 PM

Does that even fit in the context? It seems like 30 papers worth of content would just overflow it.

➕ show 1 reply

alex000kim • today at 7:26 PM

sounds similar to "LLM Knowledge Bases" https://xcancel.com/karpathy/status/2039805659525644595

MrLeap • today at 7:34 PM

What is RST?

➕ show 3 replies

alt Hacker News

Replies