> Does anyone know if Forth suffers measurably in inner loops from have to call words that perform basic operations?
Yes, the slowdown is of the order of 8× for DTC, ITC, and bytecode ("token threading"). Eliminating the jump table reduces the overhead a bit, but it's still order 8×.
The B compiler bundled a copy of the bytecode interpreter into each executable; that might have made it less appealing as a size optimization. For a big enough program it would still have won.
Subroutine threading is really just compact native code, but it still suffers from typically about 4× overhead for basic operations like dup, @, +, or exit (the traditional name for the runtime effect of ;). The primitive operations these execute are typically one or two cycles on a RISC such as a Cortex-M4, while a subroutine call and return are two more cycles, often plus two to four cycles of pipeline bubble (if the processor doesn't have good enough branch prediction). Presumably on the PDP-7 a subroutine call would have needed an additional memory cycle to store the return address into memory and another one to fetch it, plus two more memory cycles to fetch the call and return instructions. (I'm not familiar with the -7's instruction set, so correct me if I'm wrong.)
With respect to dup, though, commonly dup, drop, swap, and over represent operations that don't appear in optimized native code—they just tell the following operations which data to operate on, a purpose which is normally achieved by operand fields in native code. So the runtime overhead of stack-bytecode interpretation is a worse than it appears at first: each bytecode instruction takes time of the order of 4× or 8× the time of as a native instruction doing the same thing, but you have to run about twice as many bytecode instructions because about half of them are stack manipulation. So your total slowdown is maybe 8× or 16×.
You may also be interested in looking at the program dc, which IIRC was one of the programs Unix was originally written to run. It's a stack bytecode designed to be written by hand, like HP desk calculators of the time but with arbitrary precision.