Not training. Transposing rows/columns of matrices to group 128 parameters with similar (shared...

syntaxpr • today at 4:50 AM • 0 replies • view on HN

Not training. Transposing rows/columns of matrices to group 128 parameters with similar (shared) scale factor. Qwen-3 model.

alt Hacker News