The s curve won’t inflect until it becomes difficult to allocate additional resources due to economic limitations. There is no sign that training a model on 10x the compute won’t lead to at least an equivalent improvement as the last order of magnitude increase.
If we define the Pareto frontier’s input in terms of a magic “compute equivalent unit”. We get a free order of magnitude from nvidia hardware improvements every 2-3 years. We get another order of magnitude from capital expenditure every 6-12 months. Kernel improvements to the models themselves likely yield an order of magnitude gain at some periodicity.
The s curve won’t inflect until it becomes difficult to allocate additional resources due to economic limitations. There is no sign that training a model on 10x the compute won’t lead to at least an equivalent improvement as the last order of magnitude increase.
If we define the Pareto frontier’s input in terms of a magic “compute equivalent unit”. We get a free order of magnitude from nvidia hardware improvements every 2-3 years. We get another order of magnitude from capital expenditure every 6-12 months. Kernel improvements to the models themselves likely yield an order of magnitude gain at some periodicity.