Hi everyone, I wrote this paper. Cool to see it has been posted here already.
I want to clarify some points on things other people have mentioned:
- This architecture is not as fast as Perlin noise. IMO it is unlikely we will see any significant improvement on Perlin noise without a significant increase in compute, at least for most applications. Nonetheless, this system is not too slow for real-time use. In the Minecraft integration, for instance, the bottleneck in generation speed is by far Minecraft's own generation logic (on one RTX 3090 Ti).
- I agree that this is not "production-ready" for most tasks. The main issue is that (1) terrain is generated at realistic scales, which are too big for most applications, and (2) the only control the user has is the initial elevation map, which is very coarse. Thankfully, I expect both of these issues to be fixed pretty quickly. (1) is more specific to terrain generation, but I have a number of ideas on how to fix it. (2) is mostly an issue simply because I did not have the time to engineer a system with this many features (and as-is, the system is quite dense). I believe a lot of existing work on diffusion conditioning could be adapted here.
- The post title misses one key part of the paper title: "in Infinite, Real-Time Terrain Generation." I don't expect this to replace perlin noise in other applications. And for bounded generation, manual workflows are still superior.
- The top level input is perlin noise because it is genuinely the best tool for generating terrain at continental scale. If I had more time on my hands, I would like to use some sort of plate tectonics simulator to generate that layout, but for something simple, reasonably realistic, and infinite, perlin noise is pretty much unbeatable. Even learned methods perform on-par with perlin noise at this scale because the data is so simple.
Convincing AND useful procedural terrain is usually hard-simulated along some manually placed guides, which is typically faster and more versatile than a diffusion model. I don't see any model being used in practice for this, at least not until it has good controlnets trained specifically for this task. However something like this can be useful for texture generation, especially with geometry/camera position/lighting as additional inputs.
This is really awesome, actually! It looks great, very diverse, and clearly is scalable to extremely large maps. Props for testing the generation out on Minecraft - where terrain generation really matters.
I’m not sure if I understand the usecase: for a lot of generated worlds (of eg games) you don’t just want downsampled “realistic” topology, you want specific stylization and fine-grained artistic control. For those cases this is worse than “raw” noise. If all you wanted was to generate plausible, earth-like maps, Gemini, or Gpt would do a comparable job (with more control Id wager)
I wonder if you could use this to generate the fractal terrain by the ocean like in Sydney. We already have "artificial reefs". We have built beauty in the build environment (architecture), some in the natural environment (gardens, forests), but comparatively less when it comes to the ocean.
Not just this could improve Minecraft world generation (heh) but this could also have good use on 3D surface material generation as well, namely on the layering of different materials and generation using multi diffusions, if you look at the surface as a microscopic terrain
I wonder what it would take to adapt a model like this to generate non-Earthlike terrain. For example, if you were using it to make planets without atmospheres and without water cycles, or planets like Io with rampant volcanism.
It's interesting but i can't see it replacing Perlin or Simplex noise which are used for more than just terrain generation.
edit: I don't think i have the vocabulary to describe other issues i have other than it doesn't feel like the right way to "solve" this problem.
I'd prefer something that was entirely code rather than requiring training, and possibly retraining to get what i want.
edit2: Also is this entirely flat ? or can it be applied to a sphere (planet) , or terrain inside a cylinder (rotating space habitat) ?
Sounds like we need to add it to GTNH (only half joking)
Mm. This paper makes it hard to understand what they've done.
For example:
> MultiDiffusion remains confined to bounded domains: all windows must lie within a fixed finite canvas, limiting its applicability to unbounded worlds or continuously streamed environments.
> We introduce InfiniteDiffusion, an extension of MultiDiffusion that lifts this constraint. By reformulating the sampling process to operate over an effectively infinite domain, InfiniteDiffusion supports seamless, consistent generation at scale.
…but:
> The hierarchy begins with a coarse planetary model, which generates the basic structure of the world from a rough, procedural or user-provided layout. The next stage is the core latent diffusion model, which transforms that structure into realistic 46km tiles in latent space. Finally, a consistency decoder expands these latents into a high-fidelity elevation map.
So, the novel thing here is slightly better seemless diffusion image gen.
…but, we generate using a heirsrchy based on a procedural layout.
So basocally, tldr: take perlin noise, resize it, and then image-2-image use it as a seed to generate detailed tiles?
People have already been doing this.
Its not novel.
The novel part here is making the detailed tiles slightly nicer.
Eh. :shrug:
The paper obfuscates this, quite annoyingly.
Its unclear to me why you cant just use multi diffusion for this, given your top level input is already bounded (eg. User input) and not infinite.
I worked on something very similar for my master's degree.
The problem I could never solve was the speed, and from reading the paper it doesn't seem like they managed to solve that either.
In the end, for my work, and I expect for this work, it is only usable for pre generated terrains and in that case you are up against very mature ecosystems with a lot of tooling to manipulate and control terrain generation.
It'll be interesting to see of the authors follow up this paper with research into even stronger ability to condition and control terrain outputs.