It depends. This particular model has larger experts with more active parameters so 16GB is likely not enough (at least not without further tricks) but there are much sparser models where an active expert can be in RAM while the weights for all other experts stay on disk. This becomes more and more of a necessity as models get sparser and RAM itself gets tighter. It lowers performance but the end result can still be "useful".