logoalt Hacker News

sin2pilast Monday at 4:07 AM1 replyview on HN

I'm tinkering with relative positional encoding by trying to integrate acoustic features directly into it.

More specifically, I'm trying to use pitch (F0) to dynamically adjust the theta parameter in rotary positional embeddings, so the frequency of the positional encoding reflects the underlying pitch contour of the speech and instead of using a fixed unit circle (radius=1.0) for complex rotations, I'm trying to work out how to use variable radii derived from the pitch. The idea is to create acoustically-weighted positional encodings, where the position reflects the acoustic salience in the original audio. https://github.com/sine2pi/asr_model


Replies

kaiokendevlast Monday at 5:36 AM

having a really tough time wrapping my head around it but it sounds really interesting