They do have stereo cameras and sensor fusing, as well as detecting more than ju...

tbabb · on March 20, 2019

What in that video suggests sensor and/or stereo fusion to you?

I notice that the temporal coherence is pretty bad-- Pedestrians pop out of recognition when they go behind trees; lane/exit boundaries wiggle all over the place and occasionally frame-pop into different configurations. A kalman filter, for example, is a state estimator which maintains temporal coherence, and makes heavy use of previous estimates/sensor inference when computing the most updated estimate. It doesn't look to me like that kind of strategy is being used to maintain the vehicle's world model. IMO a good estimator wouldn't treat "a pedestrian popping out of existence" as the most likely estimate for any circumstance, let alone one where they were clearly present in the previous 50 frames. I don't doubt they're using KF on the vehicle's inertial movement, but based on the failures and this video, it sure doesn't look like it's using a fusion technique for the world model.

There are left and right-looking cameras, but the FOV overlap between them is not very substantial, and there can't be stereopsis where there is no overlap. Per the Tesla website, there are three forward-looking cameras, and they each have a different FOV. The parallax baseline between them is only a few centimeters, too, so the depth sensitivity isn't going to be spectacular. It's certainly possible that there could be some narrow-baseline stereo fusion, but it could only really happen inside the narrowest field of view, where the coverage overlaps with more than one camera. That's the circumstance where having a narrow baseline would hurt the most. Based on that it doesn't really seem like the system is well set-up for stereopsis; if it's there it seems like an afterthought.

I could certainly be wrong, as I don't have access to the code. Are you going by some other secondary source/information?

davrosthedalek · on March 21, 2019

To be fair, it could be that this is what the camera segmentation does before it is combined with other sensors, and before it is used to update the word model (which then has temporal information)

ben174 · on March 20, 2019

Certainly two cameras with different FOVs could be combined to give the same depth data that a stereo camera setup could give, right?

dmitrygr · on March 20, 2019

not if they are on the same axis. then anything along that axis cannot have its depth determined

ec109685 · on March 21, 2019

It can through other signals. You can drive successfully with one eye.

darkpuma · on March 21, 2019

I've got one eye at 20-20 vision, and the second legally blind without correction. My drivers license has a little note that it's not legal for me to drive without my glasses, which I never wear under any other circumstances.

So it's not so clear cut as you make it out to be.

(And you know what? Even if it were legal for me to drive without those glasses, I'd still drive with them. Because ranging is important!)

tbabb · on March 21, 2019

That's not stereopsis. And it's terribly inaccurate.

ec109685 · on March 21, 2019

No it isn’t. You can perceive depth through motion. And one eyed people can drive a car legally: https://itstillruns.com/can-drive-blind-one-eye-5689258.html

gamblor956 · on March 20, 2019

Stereo cameras overlap. Tesla's cameras are intended for 360-degree coverage, not overlapping vision.