Check out PyKEEN [0] and go wild. I like to train a bunch of random models and "overfit" them to the extreme (in my mind overfitting them is the point for this task, you want dense, compressed knowledge). Resize the input and output embeddings of an existing pretrained (but small) LLM (input only necessary if you're adding extra metadata on input, but make sure you untie input/output weights). You can add a linear layer extension to the transformer blocks, pass it up as some sort of residual, etc. - honestly just find a way to shove it in, detach the KGE from the computation graph and add something learnable between it and wherever you're connecting it - like just a couple linear layers and a ReLU. The output side is more important, you can have some indicator logit(s) to determine whether to "read" from the detached graph or sample the outputs of the LLM. Or just always do both and interpret it.
(like tinyllama or smaller, or just use whatever karpathy repo is most fun at the moment and train some gpt2 equivalent)
Sorry if that was ridiculously vague. I don't know a ton about the state of the art, and I'm really not sure there is one - the papers just seem to get more terminology-dense and the research mostly just seems to end up developing new terminology. My grug-brained philosophy is just to make models small enough you can just shove things in and iterate quick enough in colab or a locally hosted notebook with access to a couple 3090s, or even just modern Ryzen/EPYC cores. I like to "evaluate" the raw model using pyro-ppl to do MCMC or SVI on the raw logits on a known holdout dataset.
Really always happy to chat about this stuff, with anybody. Would love to explore ideas here, it's a fun hobby, and we're living in a golden age of open-source structured datasets. I haven't actually found a community interested specifically in static knowledge injection. Email in profile, in (ebg_13 encoded).