I've had the same idea. One way to go about it would be to modify an existing RISC-V cpu to include the ternary math ops to accelerate bitnet operations. And vector/matrix extensions based on those. Then your LLM is implemented in RISC-V assembly using those extensions. (It would be possible to do some work on the LLVM backend so you could use a C implementation of the LLM, but that starts to be a lot of work. Also, we'd need 2 bit signed int types in C.)
A completely different approach is differentiable logic networks. You end up with a logic-gate network after training. This logic gate network would be very easy to translate into Verilog or VHDL. https://github.com/Felix-Petersen/difflogic