He's not making it up and there's no reason for that tone. Strings are more straightforward to isolate compared to vocals/horns/etc because they produce a near-perfect harmonic series in parallel lines in a spectrogram. The time/frequency tradeoff exists, but it's less of a problem for strings because of their slow attack.
You can look up HPSS and python libraries like Essentia and Librosa.
Hmmm... was 'tone' a pun?
Why mention a strings 'slow attack' as less of a problem? No isolation software considers this an easy route.
Vocals are more effectively isolated by virtue of the fact they are unique sounding. Strings (and other sounds) are the similar in some ways but far more generic. All software out there indicates this, including the examples mentioned.
All wind instruments and all bowed string instruments produce a perfect harmonic series while emitting a steady tone. The most important difference between timbres of different instruments is in the attack, where inharmonic tones are also generated. Several old synths used this principle to greatly increase realism, by adding brief samples of attack transients to traditional subtractive synthesis, e.g.:
https://en.wikipedia.org/wiki/Linear_arithmetic_synthesis