Maybe another idea, no idea if this is a thing, you could pick your block-of-layers size (say... 6) and then during training swap those around every now and then at random. Maybe that would force the common api between blocks, specializaton of the blocks, and then post training analyze what each block is doing (maybe by deleting it while running benchmarks).