> In this world view, nano banana is a first early hint of what that might look like. What is h...

mvkel • yesterday at 12:47 AM • 2 replies • view on HN

> In this world view, nano banana is a first early hint of what that might look like.

What is he referring to here? Is nano banana not just an image gen model? Is it because it's an LLM-based one, and not diffusion?

Replies

simonw • yesterday at 1:48 AM

What's interesting about Nano Banana (and even more so video models like Veo 3) is that they act as a weird kind of world model when you consider that they accept images as input and return images as output.

Give it an image of a maze, it can output that same image with the maze completed (maybe).

There's a fantastic article about that for image-to-video models here: https://video-zero-shot.github.io/

> We demonstrate that Veo 3 can zero-shot solve a broad variety of tasks it wasn't explicitly trained for: segmenting objects, detecting edges, editing images, understanding physical properties, recognizing object affordances, simulating tool use, and much more.

dragonwriter • yesterday at 1:01 AM

I think he is referring to capability, not architecture, and say that NB is at the point that it is suggestive of the near-future capability of using GenAI models to create their own UI as needed.

NB (Gemini 2.5 Flash Image) isn't the first major-vendor LLM-based image gen model, after all; GPT Image 1 was first.

alt Hacker News

Replies