logoalt Hacker News

zxexz11/07/20241 replyview on HN

Sorry if that was ridiculously vague. I don't know a ton about the state of the art, and I'm really not sure there is one - the papers just seem to get more terminology-dense and the research mostly just seems to end up developing new terminology. My grug-brained philosophy is just to make models small enough you can just shove things in and iterate quick enough in colab or a locally hosted notebook with access to a couple 3090s, or even just modern Ryzen/EPYC cores. I like to "evaluate" the raw model using pyro-ppl to do MCMC or SVI on the raw logits on a known holdout dataset.

Really always happy to chat about this stuff, with anybody. Would love to explore ideas here, it's a fun hobby, and we're living in a golden age of open-source structured datasets. I haven't actually found a community interested specifically in static knowledge injection. Email in profile, in (ebg_13 encoded).


Replies

Jerrrrrrry11/07/2024

Thank you for your comments (good further reading terms), and your open invitation for continued inquiry.

The "fomo" / deja vu / impending doom / incipient shift in the Overton window regarding meta-architecture for AI/ML capabilities and risks is so now glaring obvious of an elephant in the room it is nearly catatonic to some.

https://www.youtube.com/watch?v=2ziuPUeewK0