It's funny that 128B is now considered Medium. I remember back in the day when 355M parameters was considered medium with GPT-2.
And GPT-2 1.5B was considered too dangerous to release.
They were perhaps right.
And GPT-2 1.5B was considered too dangerous to release.
They were perhaps right.