no ensembling means train 8 models and during inference avg logits of all 8 models to make a prediction.