No? First of all you can limit how much of the unified RAM goes into VRAM, and second, many applications don't need that much RAM. Even if you put 108 GB to VRAM and 16 to applications, you'll be fine.
How about the rest of the resources? CPU/GPU? Would your work not be affected by inference running?
How about the rest of the resources? CPU/GPU? Would your work not be affected by inference running?