IOMMU gives the PCIe device access to whatever range of memory it's assigned. That doesn't prevent it from being assigned memory within the address space of the process, which can even be the common case because it's what allows for zero-copy I/O. Both network cards and GPUs do that.
An even better example might be virtual memory. Some memory page gets swapped out or back in, so the storage controller is going to do DMA to that page. This could be basically any memory page on the machine. And that's just the super common one.
We already have enterprise GPUs with CPU cores attached to them. This is currently using custom interconnects, but as that comes down to consumer systems it's plausibly going to be something like a PCIe GPU with a medium core count CPU on it with unified access to the GPU's VRAM. Meanwhile the system still has the normal CPU with its normal memory, so you now have a NUMA system where one of the nodes goes over the PCIe bus and they both need full access to the other's memory because any given process could be scheduled on either processor.
We haven't even gotten into exotic hardware that wants to do some kind of shared memory clustering between machines, or cache cards (something like Optane) which are PCIe cards that can be used as system memory via DMA, or dedicated security processors intended to scan memory for malware etc.
There are lots of reasons for PCIe devices to have arbitrary physical memory access.
IOMMU gives the PCIe device access to whatever range of memory it's assigned. That doesn't prevent it from being assigned memory within the address space of the process, which can even be the common case because it's what allows for zero-copy I/O. Both network cards and GPUs do that.
An even better example might be virtual memory. Some memory page gets swapped out or back in, so the storage controller is going to do DMA to that page. This could be basically any memory page on the machine. And that's just the super common one.
We already have enterprise GPUs with CPU cores attached to them. This is currently using custom interconnects, but as that comes down to consumer systems it's plausibly going to be something like a PCIe GPU with a medium core count CPU on it with unified access to the GPU's VRAM. Meanwhile the system still has the normal CPU with its normal memory, so you now have a NUMA system where one of the nodes goes over the PCIe bus and they both need full access to the other's memory because any given process could be scheduled on either processor.
We haven't even gotten into exotic hardware that wants to do some kind of shared memory clustering between machines, or cache cards (something like Optane) which are PCIe cards that can be used as system memory via DMA, or dedicated security processors intended to scan memory for malware etc.
There are lots of reasons for PCIe devices to have arbitrary physical memory access.