logoalt Hacker News

arppackettoday at 6:39 AM7 repliesview on HN

While it's noisy and complicated for humans to read through, this session info is primarily for future AI to read and use as additional input for their tasks.

We could have LLMs ingest all these historical sessions, and use them as context for the current session. Basically treat the current session as an extension of a much, much longer previous session.

Plus, future models might be able to "understand" the limitations of current models, and use the historical session info to identity where the generated code could have deviated from user intention. That might be useful for generating code, or just more efficient analysis by focusing on possible "hotspots", etc.

Basically, it's high time we start capturing any and all human input for future models, especially open source model development, because I'm sure the companies already have a bunch of this kind of data.


Replies

JeremyNTtoday at 1:05 PM

But AI can just read the diff. The natural language isn't important.

staticassertiontoday at 9:56 AM

TBH I don't think it's worth the context space to do this. I'm skeptical that this would have any meaningful benefits vs just investing in targeted docs, skills, etc.

I already keep a "benchmarks.md" file to track commits and benchmark results + what did/ did not work. I think that's far more concise and helpful than the massive context that was used to get there. And it's useful for a human to read, which I think is good. I prefer things remain maximally beneficial to both humans and AI - disconnects seem to be problematic.

woctordhotoday at 8:30 AM

That's exactly one of the reasons I've been archiving the sessions using DataClaw. The sessions can contain more useful information than the comments for humans.

[0] https://github.com/peteromallet/dataclaw

serial_devtoday at 10:48 AM

Or just "write a good commit message based on our session, pls", then both humans and llms can use it.

JustFinishedBSGtoday at 11:26 AM

> While it's noisy and complicated for humans to read through, this session info is primarily for future AI to read and use as additional input for their tasks.

Context rot is very much a thing. May still be for future agents. Dumping tens/hundreds of thousand of trash tokens into context very much worsen the performance of the agent

ZeroGravitastoday at 8:15 AM

Similarly, git logs of existing human code seem to be a good source of info that llms don't look at unless explicitly prompted to do so.

jfostertoday at 8:15 AM

Future AIs can probably infer the requirements better than humans can write them.