Once upon a time (maybe even before pthreads) I made an automatic version of this using SIGALARM for profiling.
I made a wrapper (using LD_PRELOAD) around XSelectInput that would trigger the signal 0.1 seconds after a keyboard/mouse (or other event) was received... Then it would dump stack traces every 0.1 seconds where "slow" UI code was being executed (before next call to XSelectInput).