I think a general description is better for surveillance/tracking like this, no? If they're at a weird angle or intentionally concealing their face then facial recognition falls apart but being able to describe them naturally would result in better tracking IMO.
Presumably the ideal is some kind of a fusion. Upload or tag some images/videos and link someone's social profiles and the system can look out for them based on facial recognition, gait recognition, vehicle/pets/common wardrobe items in combination.