I've found that for the most part the articles that I want summarized are those which only fit the largest context models such as Claude. Because otherwise I can skim-read the article possibly in reader mode for legibility.
Is llama 2 a good fit considering its small context window?
I don't think this is intended for Llama 2? The Llama 3.1 and 3.2 series have very long context windows (128k tokens).
What about using a Modelfile for ollama that tweaks the context window size? I seem to remember parameters for that in the ollama GitHub docs.
Personally I use llama3.1:8b or mistral-nemo:latest which have a decent contex window (even if it is less than the commercial ones usually). I am working on a token calculator / division of the content method too but is very early