It sounds like a significant hindrance is picking out voices in crowds, and making one's own voice heard in the crowd, as a result of the poor microphone / speakers on the iPad.
With the great big self-balancing stick peripheral, it's a wonder that they've not thought to attach a pair of microphones with binaural software onto the double (so you can hear if someone is talking behind you, to the right, etc). Then the user at the other end could put on some headphones and pick up everything that's going on around them. Binaural audio feedback also greatly enhances immersion.
Surely a slightly larger / louder speaker wouldn't cost too much either?
It would be a trade-off. It would be more immersible for you, but less for everyone else, as your only only human appearance (your face) would be obscured.
Longer term solution: Your presented face would be computer animated.
Uncanny valley, here we come. Studies have shown that people are more at ease around a robot that tries to look like a cute humanoid robot, rather than a near-human but not quite construct.
Replacing the video feed of a person's face with a computer generated, almost-their-face-but-not-quite simulacrum sounds like it'd heavily increase user nightmare rates.
Maybe it is just me, but with a disembodied head rolling around on a segway, we are already rather deep down in the uncanny valley. Removing a bit of humanity might just pull it up the left bank of it.
So the proper solution is to sit in the middle of a multi-screen panopticon and use multiple cameras on the telepresence device. People looking at the front of you would "see" you face left to look at something to the left, for example. You'd still have to turn the machine to face people properly if you wanted to make "eye contact".
I'm in the "I want to believe" category when it comes to distributed teams, mainly because technology is far more social than technical. But so many people want to work at the beach in their pajamas, there's gotta be a way forward. So I'm rooting for this tech.
Sounds like we need at a minimum: 1) ability to get up off floor on your own, 2) ability to find an outlet and charge on your own, 3) better microphone and conversation discrimination, and 4) some way of having "eye" contact
I'dd add for bonus points: 1) some kind of peripheral vision, 2) ability to wave arms around to get attention and demonstrate crude body language, 3) ability to grasp and manipulate things in some kind of fashion resembling a hand.
My company has had a Double in our San Francisco lounge for about a year now (we helped Double Robotics beta test it). About 95% of our employees are remote so it's pretty cool being able to hop in and say hi to anyone that's co-working out of our lounge that day.
It's really no different than using Skype video chat or a Google+ Hangout except that you aren't stuck on someone's laptop -- you can move around if you need to.
It's rather gimicky but it's also quite fun. Still, if I needed to talk one-on-one with someone, or even participate in a group meeting (something I've never actually done due to my company's culture), then I think a standard video chat would work just as fine.
Jokes aside, it's quite interesting to think about how close to an in-person experience this (or similar) can get you. For example body-language. I get quite animated when getting in to a technical discussion - how would that feel in this case?
Perhaps they could make it lean towards people if you raise your voice? Or if you shake your head in disgust it could wiggle?
To be honest though, email is still pretty much the best communication method over distance. It allows you to think, revise and reference other information easier. Many a time I've torpedoed a face to face meeting with a single email.
With the great big self-balancing stick peripheral, it's a wonder that they've not thought to attach a pair of microphones with binaural software onto the double (so you can hear if someone is talking behind you, to the right, etc). Then the user at the other end could put on some headphones and pick up everything that's going on around them. Binaural audio feedback also greatly enhances immersion.
Surely a slightly larger / louder speaker wouldn't cost too much either?