The book Daemon by Daniel Suarez has a similar concept in it. The evil guys put a few cameras and ultrasonic speakers in the room. They can create a localized area of intense sound to really fuck up the good guy's head without any indication of what's going on, as well as beam voices directly into his head - silence on the walkie talkies and no physical indication of what's going on. If you limit yourself to one occupant, and can get good 3d tracking of their ears, then theoretically a few speakers on a pan/tilt mount should be able to do ANC over the entire room. Of course if you have to do it for multiple occupants then you would need a speaker every 6.8 cm.
The audio device mentioned in Daemon exists. It's been tried for advertising. You can buy one if you want.[1][2] The audio quality is not very good, but voice will get through. The main application is museums, where you have one next to each exhibit, pointing down from above.
The problem with creating nulls by cancellation is that you create peaks somewhere else. You could null out one person (maybe only one ear) somewhere in the room, but multiple targets will be really hard.
A conventional speaker creates a sphere of sound. A directional speaker like the one you linked creates a beam of sound. But an array of speakers could create a localized spot of sound by creating sound which would only add up to the target at that location. Of course there would be random fragments of parts of the sound (the unadded components) at other random places where the waves coincide, but for only one person in a room this is effectively the same as having a spot of sound with no other side effects.
I did forget to think that two ears would be like having two different people - there would be two target points where the sound would have to add up/subtract to the target.