How do I get that loss, though, without the softmax inputs?
Do they have logits for all of the Wikipedia etc that they've scraped?
Do they have logits for all of the Wikipedia etc that they've scraped?