Clang's solution (presented at the end of the Wikipedia article you linked) seem much better - just use a single lexical token for both types and variables.
Then, only the parser needs to be context sensitive, for the A* B; construct which is either a no-op multiplication (if A is a variable) or a variable declaration of a pointer type (if A is a type)
Well, as you see this is inherently taking the spirit of GLL/GLR parser -- defer parse until we have all the information. The academic solution to this is not to do it on token level but introduce a parse tree that is "forkable", meaning a new persistent data structure is needed to "compress" the tree when we have different routes, and that thing is called: graph structured stack (https://en.wikipedia.org/wiki/Graph-structured_stack)