What OpenAI did was train increasingly large transformer model instances. which was sensible because transformers allowed for a scaling up of training compared to earlier models. The resulting instances (GPT) showed good understanding of natural language syntax and generation of mostly sensible text (which was unprecedented at the time) so they made ChatGPT by adding new stages of supervised fine tuning and RLHF to their pretrained text-prediction models.
I miss having the completion models like davinci-003 since it gained in performance where it lacked simplicity to get what you want out.
It was fun to come up with creative ways to get it to answer your question or generate data by setting up a completion scenario.
I guess "chat" became the universal completion scenario. But I still feel like it could be "smarter" without the RLHF layer of distortion.
There were plenty of models the size of gpt3 in industry.
The core insight necessary for chatgpt was not scaling (that was already widely accepted): the insight was that instead of finetuning for each individual task, you can finetune once for the meta-task of instruction following, which brings a problem specification directly into the data stream.