logoalt Hacker News

storustoday at 2:36 AM0 repliesview on HN

Based on the current DeepSeek website I suspect it's not going to be great as their current model (V3.4? V4-mini?) often forgets or changes facts explicitly mentioned in the conversation which R1 never did. It's better than R1 at math or coding, but nearly unusable for deep conversation. I suspect they pushed MLA or linear attention too much, or quantize a lot more than before.