I would put that under the umbrella of algo/math, i.e. the structure of the LLM is part of the algo, which is itself governed by math
For example, DeepSeek has done some interesting things with attention, via changes to the structures / algos, but all this is still optimized by gradient descent, which is why models do not learn facts and such from a single pass. It takes many to refine the weights that go into the math formulas
I would put that under the umbrella of algo/math, i.e. the structure of the LLM is part of the algo, which is itself governed by math
For example, DeepSeek has done some interesting things with attention, via changes to the structures / algos, but all this is still optimized by gradient descent, which is why models do not learn facts and such from a single pass. It takes many to refine the weights that go into the math formulas