This doesn't do the same thing though, since it's not Unicode aware.
>>> 'x\u2009 a'.split()
['x', 'a']
# incorrect; in bytes mode, `\S` doesn't know about unicode whitespace
>>> list(re.finditer(br'\S+', 'x\u2009 a'.encode()))
[<re.Match object; span=(0, 4), match=b'x\xe2\x80\x89'>, <re.Match object; span=(7, 8), match=b'a'>]
# correct, in unicode mode
>>> list(re.finditer(r'\S+', 'x\u2009 a'))
[<re.Match object; span=(0, 1), match='x'>, <re.Match object; span=(5, 6), match='a'>]OP's .split_ascii() doesn't handle U+2009 as well.
edit: OP's fully native C++ version using Pystd
There's bound to be a way to turn a stream of bytes into a stream of unicode code points (at least I think that's what python is doing for strings). Though I'm explicitly not volunteering to write the code for it.