Yes, storing video locally would support that. On request, the app and the local storage would negotiate a session key, and the video would be transferred to the phone. Ring would only have the encrypted video, and would not have the decryption key for it.
Yes, it would be slower, would require video encoding/rescaling near the local storage, but it would actually be secure that way.
> storing video locally would support that. On request, the app and the local storage would negotiate a session key, and the video would be transferred to the phone.
How do they do that if the app is halfway across the country? They have to go through a common server.
Public-key encryption requires the two endpoints to be able to send each other data. See my comments about ISPs and firewalls elsewhere in this thread.
First off, as mentioned in other comments, ISPs tend to not enforce the rules about not running Internet-visible servers.
But let's go ahead and go with your assumption that they are. You can still have a secure connection from your doorbell camera to a mobile app anywhere in the world. Both your app and your camera connect to an intermediate server. This server merely acts like a proxy, passing packets between the two. Using a standard TLS handshake, the app can establish encryption with the camera without the proxy in the middle being able to decrypt the traffic. When the camera is initially setup, it can generate a TLS certificate that the app can download and pin (Since the app and camera will be on the same Wifi network), so that the proxy server can't try to present its own and intercept the communications.
If you need me to go into greater detail, I can. But this is definitely a solved problem.
EDIT: Another way to think of this...apps like Signal and Wire let people talk to each other by each client connecting to a central server to send and retrieve messages, but without the ability for those central servers to intercept the contents of the messages through public key encryption. The camera-to-app connection would work basically the same way.
I understand all this, but is there any camera out there that supports this kind of setup out of the box? Or do I have to roll my own if I want to do this?
Yes, it would be slower, would require video encoding/rescaling near the local storage, but it would actually be secure that way.