logoalt Hacker News

volodiatoday at 1:57 AM6 repliesview on HN

Co-founder / Chief Scientist at Inception here. If helpful, I’m happy to answer technical questions about Mercury 2 or diffusion LMs more broadly.


Replies

nowittyusernametoday at 2:39 AM

How does the whole kv cache situation work for diffusion models? Like are there latency and computation/monetary savings for caching? is the curve similar to auto regressive caching options? or maybe such things dont apply at all and you can just mess with system prompt and dynamically change it every turn because there's no savings to be had? or maybe you can make dynamic changes to the head but also get cache savings because of diffusion based architecture?... so many ideas...

show 1 reply
goktoday at 5:38 AM

Do you use fully bidirectional attention or is it at all causal?

nltoday at 2:56 AM

I had a very odd interaction somewhat similar to how weak transformer models get into a loop:

https://gist.github.com/nlothian/cf9725e6ebc99219f480e0b72b3...

What causes this?

show 1 reply
techbro92today at 2:34 AM

Do you think you will be moving towards drifting models in the future for even more speed?

show 1 reply
kristianptoday at 2:09 AM

How big is Mercury 2? How many tokens is it trained on?

Is it's agentic accuracy good enough to operate, say, coding agents without needing a larger model to do more difficult tasks?

show 1 reply
CamperBob2today at 2:04 AM

Seems to work pretty well, and it's especially interesting to see answers pop up so quickly! It is easily fooled by the usual trick questions about car washes and such, but seems on par with the better open models when I ask it math/engineering questions, and is obviously much faster.

show 1 reply