You do want to offload crypto to dedicated hardware otherwise your transport will get stuck at a paltry 40-50 Gb/s per core. However, you do not need more than block decryption; you can leave all of the crypto protocol management in userspace with no material performance impact.