Has anyone started to implement this technique in Llama.cpp or similar inference tool?
There was some work done on this a while back, during the FrankenMerge craze of 23'
I am working with TurboDerp to integrate this into the Exllama v3 format.
There was some work done on this a while back, during the FrankenMerge craze of 23'
I am working with TurboDerp to integrate this into the Exllama v3 format.