it'a a new architecture. Not yet implemented in llama.cpp
issue to follow: https://github.com/ggml-org/llama.cpp/issues/18931