logoalt Hacker News

__alexstoday at 10:10 AM7 repliesview on HN

Why are tokens not coloured? Would there just be too many params if we double the token count so the model could always tell input tokens from output tokens?


Replies

xg15today at 10:24 AM

That's something I'm wondering as well. Not sure how it is with frontier models, but what you can see on Huggingface, the "standard" method to distinguish tokens still seems to be special delimiter tokens or even just formatting.

Are there technical reasons why you can't make the "source" of the token (system prompt, user prompt, model thinking output, model response output, tool call, tool result, etc) a part of the feature vector - or even treat it as a different "modality"?

Or is this already being done in larger models?

easeouttoday at 2:08 PM

Because they're the main prompt injection vector, I think you'd want to distinguish tool results from user messages. By the time you go that far, you need colors for those two, plus system messages, plus thinking/responses. I have to think it's been tried and it just cost too much capability but it may be the best opportunity to improve at some point.

oezitoday at 10:48 AM

Instead of using just positional encodings, we absolutely should have speaker encodings added on top of tokens.

jhrmnntoday at 11:25 AM

Because then the training data would have to be coloured

show 1 reply
layer8today at 11:52 AM

This has the potential to improve things a lot, though there would still be a failure mode when the user quotes the model or the model (e.g. in thinking) quotes the user.

efromvttoday at 10:46 AM

I’ve been curious about this too - obvious performance overhead to have a internal/external channel but might make training away this class of problems easier

cyanydeeztoday at 10:20 AM

you would have to train it three times for two colors.

each by itself, they with both interactions.

2!

show 2 replies