I built this because I was tired of "AI tools" that were just wrappers around expensive AP...

divyaprakash • today at 7:37 AM • 1 reply • view on HN

I built this because I was tired of "AI tools" that were just wrappers around expensive APIs with high latency. As a developer who lives in the terminal (Arch/Nushell), I wanted something that felt like a CLI tool and respected my hardware.

The Tech:

    GPU Heavy: It uses decord and PyTorch for scene analysis. I’m calculating action density and spectral flux locally to find hooks before hitting an LLM.

    Local Audio: I’m using ChatterBox locally for TTS to avoid recurring costs and privacy leaks.

    Rendering: Final assembly is offloaded to NVENC.

Looking for Collaborators: I’m currently looking for PRs specifically around:

    Intelligent Auto-Zoom: Using YOLO/RT-DETR to follow the action in a 9:16 crop.

    Voice Engine Upgrades: Moving toward ChatterBoxTurbo or NVIDIA's latest TTS.

It's fully dockerized, and also has a makefile. Would love some feedback on the pipeline architecture!

Replies

ramon156 • today at 9:59 AM

I don't get this reasoning. You were tired of LLM wrappers, but what is your tool? These two requirements (felt like a CLI and respects your hardware) do not line up.

Still a cool tool though! Although it seems partly AI generated.

➕ show 2 replies

alt Hacker News

Replies