I started reading this and right away hit something that doesn't really make any sense to me:

teraflop • today at 1:12 AM • 1 reply • view on HN

> the extractor. the thing that reads conversation transcripts and decides what to keep.

> the most consequential choice an extractor makes is timing. extract eagerly, after every message, and you spend tokens on small talk that goes nowhere. extract lazily, at the end of a session, and the context you needed to resolve a pronoun is already gone.

If the input is coming from a transcript, then either that transcript contains enough context to understand what a particular pronoun refers to, or it doesn't.

If it does, why would waiting until the end of a session be a problem? What am I missing?

Replies

brgsk • today at 1:21 AM

good catch - the example is sloppy. the real issue is lost-in-the-middle on long transcripts: the extracting model attends worse to material between endpoints, so "the transcript is still there" doesn't mean the extraction sees it equally.

➕ show 1 reply

alt Hacker News

Replies