Really looking forward to testing and benchmarking this on my spam filtering benchmark. gemma-3-27b was a really strong model, surpassed later by gpt-oss:20b (which was also much faster). qwen models always had more variance.
Does spam filtering really need a better model? My impression is that the whole game is based on having the best and freshest user-contributed labels.
If you wouldn't mind chatting about your usage, my email is in my profile, and I'd love to share experiences with other HNers using self-hosted models.