logoalt Hacker News

swidtoday at 4:15 AM1 replyview on HN

If you know you will be pruning or otherwise reusing the context across multiple threads, the best place for context that will be retained is at the beginning due to prompt caching - it will reduce the cost and improve the speed.

If not, inserting new context any place other than at the end will cause cache misses and therefore slow down the response and increase cost.

Models also have some bias for tokens at start and end of the context window, so potentially there is a reason to put important instructions in one of those places.


Replies

catlifeonmarstoday at 4:22 AM

I wonder how far you can take that. Basically can you jam a bunch of garbage in the middle and still get useful results