Microbenchmarks. A LUT will win many of them but you pessimise the rest of the code. So unless a significant (read: 20+%) portion of your code goes into the LUT, there isn't that much point to bother. For almost any pure calculation without I/O, it's better to do the arithmetic than to do memory access.
Locality within the LUT matters too: if you know you're looking up identical or nearby-enough values to benefit from caching, an LUT can be more of a win. You only pay the cache cost for the portion you actually touch at runtime.
I could imagine some graphics workloads tend compute asin() repeatedly with nearby input values. But I'd guess the locality isn't local enough to matter, only eight double precision floats fit in a cache line.