I still can't believe that LLM encoders aren't unsupervised learned.
So much left on the table
They are using Qwen, so this is decoder only.
They are using Qwen, so this is decoder only.