The "average" surveillance camera often does not have sufficient pixels-on-target/pixels-per-foot to do face detection for the majority of persons in the scene. The trend towards higher resolution cameras (eg: from an average resolution of ~3MP today towards 4K resolution) will of course help with face detection somewhat, but you also have angle of view concerns, and the fact that people don't always walk towards the camera for a good face shot.
Various companies have been implementing appearance search functionality that lets you search for similar appearances across multiple cameras to find the scene/image where you got a facial shot of a person of interest. Preventing the majority of cameras from classifying someone as a "person" object would significantly disrupt that workflow.
Perhaps something like AdaBoost could find matches using a weak classifier on hundreds of low-resolution images (a few seconds of video) at different angles, assuming you can put together a similar data set for training... basically trading spatial information content for temporal.
Various companies have been implementing appearance search functionality that lets you search for similar appearances across multiple cameras to find the scene/image where you got a facial shot of a person of interest. Preventing the majority of cameras from classifying someone as a "person" object would significantly disrupt that workflow.