Fwiw in this application you would never need to divide by an arbitrary integer each time; you'd pick it once and then plumb it into libdivide and get something significantly cheaper than 8-30 cycles.