Co-founder / Chief Scientist at Inception here. If helpful, I’m happy to answer technical quest...

volodia • today at 1:57 AM • 6 replies • view on HN

Co-founder / Chief Scientist at Inception here. If helpful, I’m happy to answer technical questions about Mercury 2 or diffusion LMs more broadly.

Replies

nowittyusername • today at 2:39 AM

How does the whole kv cache situation work for diffusion models? Like are there latency and computation/monetary savings for caching? is the curve similar to auto regressive caching options? or maybe such things dont apply at all and you can just mess with system prompt and dynamically change it every turn because there's no savings to be had? or maybe you can make dynamic changes to the head but also get cache savings because of diffusion based architecture?... so many ideas...

➕ show 1 reply

gok • today at 5:38 AM

Do you use fully bidirectional attention or is it at all causal?

nl • today at 2:56 AM

I had a very odd interaction somewhat similar to how weak transformer models get into a loop:

https://gist.github.com/nlothian/cf9725e6ebc99219f480e0b72b3...

What causes this?

➕ show 1 reply

techbro92 • today at 2:34 AM

Do you think you will be moving towards drifting models in the future for even more speed?

➕ show 1 reply

kristianp • today at 2:09 AM

How big is Mercury 2? How many tokens is it trained on?

Is it's agentic accuracy good enough to operate, say, coding agents without needing a larger model to do more difficult tasks?

➕ show 1 reply

CamperBob2 • today at 2:04 AM

Seems to work pretty well, and it's especially interesting to see answers pop up so quickly! It is easily fooled by the usual trick questions about car washes and such, but seems on par with the better open models when I ask it math/engineering questions, and is obviously much faster.

➕ show 1 reply

alt Hacker News

Replies