yeah, it appears to support audio and image input.. and runs on mobile devices with 256K context window!