This is impressive, how do people handle the limited context window of 64k tokens?
Same as they did it back in the "old days" when GPT-4 was 8k and LLaMA was 2k. Chunking, RAG etc, then cross your fingers and hope that it all works reasonably well.
By using o1
Same as they did it back in the "old days" when GPT-4 was 8k and LLaMA was 2k. Chunking, RAG etc, then cross your fingers and hope that it all works reasonably well.