I cannot remember off the top of my head the exact number and am clearly too lazy to google it but there is a specific length of time in which, if no new noises pass through, the human brain processes it as a pause/silence.
I want to say 300ms which would coincide with your 500ms example
This is definitely dependent on individuals. It’s a reason during some conversations people can never seem to get a word in edgewise, even if the person speaking may think they’re providing opportunities do so. A mismatch in “pause length” can make for frustrating communications.
I am also too lazy to google or AI it but it’s something I remember from when I taught ESL long ago.