From the point of view of sensors the square model seems correct. Sensor pixels are photon counters. If you want to turn a 2x2 pixel square into a single pixel summing the four pixels is the same as if you had a sensor with 1/4 of the pixels and each of them counts the photons over those 2x2 areas[1]. What we may be saying is that since you have the extra pixels you can actually do better than if the sensor was already natively at your desired resolution by using a more complex filter over the extra data (e.g., by avoiding the Moiré artifacts that are common in lower resolution sensors).
[1] Most sensors are bayer patterns so some extra considerations apply to color resolution
[1] Most sensors are bayer patterns so some extra considerations apply to color resolution