I'm tinkering with relative positional encoding by trying to integrate acoustic features direct...

sin2pi • last Monday at 4:07 AM • 1 reply • view on HN

I'm tinkering with relative positional encoding by trying to integrate acoustic features directly into it.

More specifically, I'm trying to use pitch (F0) to dynamically adjust the theta parameter in rotary positional embeddings, so the frequency of the positional encoding reflects the underlying pitch contour of the speech and instead of using a fixed unit circle (radius=1.0) for complex rotations, I'm trying to work out how to use variable radii derived from the pitch. The idea is to create acoustically-weighted positional encodings, where the position reflects the acoustic salience in the original audio. https://github.com/sine2pi/asr_model

Replies

kaiokendev • last Monday at 5:36 AM

having a really tough time wrapping my head around it but it sounds really interesting

alt Hacker News

Replies