This looks awesome!!! I’m curious on the ensemble: does it mean “train 8 different models and pick the best one”? That’s what my mind jumps to, but that also seems wrong, because I assume we could just keep increasing the number of different models you train to get a win.
no ensembling means train 8 models and during inference avg logits of all 8 models to make a prediction.