For how many times does this article mentions "agentic" and "agents"... Am I correct assume the hardware has nothing to do with "agents"? I assume it's just about a new generation of more efficient transformers / deep-learning layers.
There’s issues specific to workflows “agents”. For example many requests in an agent are all on top of the same previous results so context (kv cache) needs to be longer, and they use these massive connected nodes with direct nvme to cache the part of the prompt that’s repeatable.
It is about agents in that the design is for long context, many requests where the initial “chunk” is cached but spread across many requests.
They don’t call this out specifically but in the technical details like about the sram, how it’s all interconnected nodes in a pod it’s “designed” for it.
There’s issues specific to workflows “agents”. For example many requests in an agent are all on top of the same previous results so context (kv cache) needs to be longer, and they use these massive connected nodes with direct nvme to cache the part of the prompt that’s repeatable.
It is about agents in that the design is for long context, many requests where the initial “chunk” is cached but spread across many requests.
They don’t call this out specifically but in the technical details like about the sram, how it’s all interconnected nodes in a pod it’s “designed” for it.