Really cool project! Impressive to see this being done in pure C. I'm the maintainer of MFLUX...

filipstrand • today at 2:32 AM • 0 replies • view on HN

Really cool project! Impressive to see this being done in pure C.

I'm the maintainer of MFLUX (https://github.com/filipstrand/mflux) which does a similar thing, but at a higher level using the MLX framework optimised for Apple Silicon. I just merged Flux 2 Klein support as well and was happy to see this discussion :)

I started out doing this type of work roughly 1.5 years ago when FLUX.1 was released and have been doing it off and on ever since with newer models, trying to use more and more AI over time.

At one point, I vibe-coded a debugger to help the coding agent along. It worked OK but as models have gotten stronger, this doesn't really seem to be needed anymore. My latest version simply has a SKILL.md that outlines my overall porting strategy (https://github.com/filipstrand/mflux/blob/main/.cursor/skill...). Somewhat surprisingly, this actually works now with Cursor + Codex-5.2, with little human intervention.

> Even if the code was generated using AI, my help in steering towards the right design, implementation choices, and correctness has been vital during the development.

This definitely resonates! Curious to hear more about what worked/didn't for you. A couple of things I've found useful:

- Porting the pipeline backwards: This is the way I did it personally before using any coding models. The typical image generation flow is the following:

1.Text_encodings (+ random_noise_latent) 2.Transformer loop 3.VAE decoding

I found that by starting with the VAE first (by feeding it pre-loaded tensors from the reference extracted at specific locations) it was the quickest way to get to an actual generated image. Once the VAE is done and verified, only then proceed backwards the chain and handle the Transformer, etc. I still prefer to do it this way and I like to manually intervene between step 3,2 and 1, but maybe this won't actually be needed soon?

- Also, with the VAE, if you care about implementing the encoding functionality (e.g to be used with img2img features), the round-trip test is a very quick way to verify correctness:

image_in -> encode -> decode -> image_out : compare(image_in, image_out)

- Investing in a good foundation for weight handling, especially when doing repeat work across similar models. Earlier coding models would easily get confused about weight assignment, naming conventions etc. A lot of time could be wasted because weight assignment failed (sometimes silently) early on.

alt Hacker News