logoalt Hacker News

dvrplast Friday at 8:19 PM0 repliesview on HN

Qwen-Image-Layered is a diffusion model that, unlike most SOTA-ish models out there (e.g. Flux, Krea 1, ChatGPT, Qwen-Image) it's (1) open-weight (unlike ChatGPT Image or Nano Banana) and Apache 2.0; and has 2 distinct inference-time features: (i) it's able to understand the alpha channel of images (RGBA, as opposed to RGB only) which makes it able to generate transparency-aware bitmaps; and (ii), it's able to understand layers [1]—this is how most creative professionals work in software like Photoshop or Figma, where you overlay elements into a single file, such as a foreground and a background.

This is the first model by a main AI research lab (the people behind Qwen Image, which is basically the SOTA open image diffusion model) with those capabilities afaik.

The difference in timing for this submission (16 hours ago) is because that's when the research/academic paper got released—as opposed to the inference code and model weights, which just got released 5 hours ago.

---

Technically there's another difference, but this mostly matters for people who are interested in AI research or AI training. From their abstract: “[we introduce] a Multi-stage Training strategy to adapt a pretrained image generation model into a multilayer image decomposer.” which seems to imply that you can adapt a current (but different) image model to understand layers as well, as well as a pipeline to obtain the data from Photoshop .PSD files.