logoalt Hacker News

thesztoday at 6:48 PM0 repliesview on HN

  > The model cares about what you're saying, not what language you're saying it in.
What is the number of languages model is trained upon? And what is the number of training set sentences? I believe that these numbers are vastly different and cosine similarity is overwhelmingly biased by number of sentences.

What if we equalize number of languages and number of sentences in the training set? A galaxy-wise LLM, so to say.

Also, model can't help but care about language because your work shows divergence of cosine similarity at the decoding (output) stage(s).