yes, in typical architectures for models dealing with text, the output is a token from the same vocabulary as the input.