Hacker News new | past | comments | ask | show | jobs | submit login
Why mobile apps suck when you're mobile (TCP over 3G) (davidsingleton.org)
298 points by dps on June 27, 2011 | hide | past | favorite | 68 comments



There were plenty of wireless-optimized TCP replacements proposed back in the days when WAP and XHTML Mobile were the hottest things around, but none took root as operators, web servers and browsers needed to adopt them in tandem.

Now that smartphone apps are widespread and someone developing a service can control both sides of the connection, there's definitely room for someone to devise a really good TCP replacement (layered on top of UDP) with an iOS library, an Android library, and an Apache mod.


The article misses that this is fundamentally an issue with the PHY level of "mobile" broadband systems.

If you've got a very simple radio, say, a ham 2 meter handset, you find there are interference patterns caused by multipath interference that cause the signal to get stronger and weaker as you move half a wavelength this way or that way. This causes a fluttering noise when you're listening to somebody transmitting from a car.

A more advanced radio that uses a spread-spectrum signal finds that at any moment of time, some parts of the signal's frequency range is in-phase and some is out-of-phase.

If your goal is to make a low-bandwidth signal robust, you can spread a low-bandwidth signal over a wide spectral range and you won't notice fading. On the other hand, if you're trying to send as much data as you possibly can in the bandwidth you've got, you're ultimately going to implement something like OFDM. Now, in an OFDM system, you're using frequencies in parallel to transmit more data, not to improve robustness. If you're sitting in one place your system can figure out what the fading is and work around it. If you're moving, the relative performance of the subchannels is always changing too fast for OFDM to work.

This is how you can have a good voice connection on your phone (which is using a PHY truly built for mobile operation) but not have it on for data (which is using a PHY built for high performance operation from a fixed point.)


I don't think you understood the article. It is not about the physical restrictions and trade-offs of mobile broadband. It is about the inefficiency of TCP on such connections. Current TCP implementations are optimized for wired networks which usually have fixed bandwidth and latency. They use dropped packages for congestion avoidance, interpreting package loss as a signal to slow down transmission.

Wireless networks with their higher packet loss (compared to wired ones) fool the sender into believing that the connection is satured. Combine this with the enormous buffer sizes used by mobile broadband systems and the TCP congestion control algorithm throttles the connection to an unusably slow level.

The author is not making a case for trading speed for reliability by using a different PHY, he is making a point for embracing the package loss and delays that exist in current mobile broadband systems and using those characteristics to create a smarter transport layer protocol to replace TCP on mobile devices.


Yes, I do recognize that.

In the big picture, however, it's generally better to fix the part of a system that's broken (the PHY) than it is to try to compensate for the underlying problem in upper layers.

It's also good to be careful about your terms. "Mobile" means operation out of a car or other motorized moving platform. "Portable" means a device that's being carried by a pedestrian. When the bandwidth of a radio channel gets wide, these become very different environments.

All I can say about new protocols is that you can go to your local Uni and find (quite literally) a ton of conference proceedings on this very topic. It seems that somebody has been funding a ludicrous amount of research on TCP replacements for decades without anything practical coming out of it.

Out of all that literature you ought to be able to find something that works or find a good reason why it can't be done.


Mobile also means "As in it's carried around as well as used in cars". It's a term for smartphone. They're using modern parlance to talk about smartphones, not radio parlance.


I probably missed half of what you were trying to explain there, but I didn't see a good solution. Are you suggesting we should give up speed for reliability at the PHY level?


It's a choice made at the PHY level and it's a tradeoff that's a little more complex than speed vs. reliability -- mobile operation introduces special challenges, as well as operation with weak signals (satellite), as well as operation in urban areas. There's a reason why Europe has 3 distinctly different digital television standards.


For TCP over 3G, could they just turn off reliability at the PHY level and let TCP do its thing?


Yeah, but you don't need to do much to TCP to make it work perfectly well on wireless.

Let me repeat that: Don't fuck with TCP. TCP is fine. It's BUILT to deal with your lousy network, it's DESIGNED to deal with nuclear attacks from the Guys of the Bad Color, and (ironically) the worst thing you can do to TCP is to try to be nice to it.

"What about keep-alives?"

There are no keep-alives in TCP. If two hosts are in agreement about the address and port, they can keep a connection in an open state until the universe goes cold. (There may be keepalives necessary /above/ TCP, but TCP itself doesn't use 'em). Don't yammer something about your internal state every 30 seconds; you don't do it on the bus home, so don't do it on your network, either.

"Don't we need this proprietary fancy-schmancy buffering algorithm from FuppedUck Communications Consultants? They keep telling us we're toast without it."

No. Just run TCP. Spend money on decent routers.

"I'm feeling really nervous and I want to, like, disable Nagle for everything. Isn't that what you do? Turn Nagle off and maybe muck with the --"

You're from AT&T, aren't you?


Not entirely true. TCP is certainly decent, but it's not designed to deal with networks where the delays are extremely transient and the capacity is variable. It's not designed for heavy buffering. And so on.

It can be fixed with congestion control protocols tacked on the side, I think, but it's designed for wired networking with slow transients in the performance profile.

It IS, however, better than anything that I'd be able to hack together quickly.


The mobile OS should be able to fix this.

One method would be to disable the reliable delivery mechanism in the 3G code.

Another method would be to use huge retransmission timeouts at the TCP level for connections going over 3G, so the TCP level seldom retransmits packets.

If the 3G implementation provides completely reliable connections, the OS could fake the TCP implementation and just use that implementation directly instead of sending TCP packets. (not entirely unlike the sshuttle VPN).


The 3G connection does not aim to provide completely reliable "connections" -- it aims to provide an as-reliable-as-it-can data link, given channel conditions and configured transmission parameters (which are to some extent controlled by the RAN). If the error rate seen by the channel decoder is too high, it will drop a frame. There is an ARQ mechanism in the MAC layer to try to compensate so that the higher layers don't see the loss, but it implies some delay and also has limits -- i.e. at some point losses due to bit errors injected by the channel may manifest in lost frames and IP packets.

The L1 and MAC of 3GPP are generally implemented in an ASIC and not necessarily visible to the OS. Many of the parameters are dictated and controlled by the RAN (operators are picky and try to control things so that handsets can't misbehave and crap all over the other users in a cell), so the OS doesn't get a look in -- i.e. I don't think it's possible to turn off the 3G error recovery mechanisms.

Disclaimer: I used to work at L1 so someone with more detailed knowledge the MAC and higher layers of the 3GPP stack could give a better idea of what happens when the MAC ARQ limits are hit.

Summary: 3G was designed more than 10 years ago and is showing its age. User requirements have shifted, but standards can't evolve that quickly. Maybe LTE will help provide the user experience for today's apps... question is whether it will be widely deployed in time to matter.

EDIT: updated for clarity and spelling.


TCP is end to end. 3G is just carries data to your service provider. So wouldn't faking TCP screw up the communication between the mobile service provider and the server you're communicating with?


I think this is a really good idea. Startup? PhD thesis?


Here is a paper from 03: http://www.usenix.org/event/mobisys03/tech/full_papers/chakr...

As I recall, one of the key problems this group found was that GPRS had a very low incidence of packet loss. This was because the physical layer protocol had some error correction built in. TCP wasn't designed for this and ended up doing the wrong thing. This group proposed a custom reliable protocol between the mobile device and a proxy gateway, and their results showed improved performance.

We're not talking about GPRS but I am curious why there is so much packet loss?

Also, just FYI: MobiSys (the top academic conference on mobile systems) is happening in the DC area this week.


I'll bet you thought Vint Cerf was just a trophy hiring. :) http://gigaom.com/2009/11/06/vint-cerf-plugs-his-plucky-spac...


Do tell how you reckon a start-up could bring to popularity, let alone monetize a new Internet protocol?


We're not talking about replacing TCP as the standard transport protocol. But if you're a mobile app developer and you control both the client (via a native app) and the server you can use whatever you want. If someone provided client and server implementations of a more efficient networking protocol it could become popular.

I have no idea about monetization though. I doubt just charging for the code would work.


I know of at least one company that tried this and failed. It was quite a long time ago and targeted large companies, but it did work. To give you an idea of how long ago, it was written for PocketPC 3.0.

The company is now defunct.


we did this at Visto corp a few years back. we wrote our own UDP based protocol that acknowledged every packet. it worked really well in horrible network conditions and was incredibly slow everywhere else. roughly speaking it was a bit like "TCP with no sliding window" and we had trouble w/proxies in the carrier networks. this was pre-3G.


SPDY or SCTP maybe? I'm not sure if they work better specifically in high latency environments, but they are touted as TCP replacements.


Actually, SPDY is an application layer protocol -- as HTTP -- not an transport layer protocol like TCP.


SPDY isn't a tcp replacement, it's a layer 7 protocol on top of TCP.


The problem for those of us on capped and/or expensive-per-kbyte mobile connections (in the UK that is everyone who doesn't spend a large chunk on their monthly contract - people on Virgin pay-as-you-go pat £3 for a day's access but IIRC you get cut off after 25Mbytes in that day) with restarting connections early is that the ~20 seconds worth of packets queued up during the blip is going to be sent anyway even though they are now no longer needed. 20 seconds worth of discarded packets could be quite a bit if you were transferring data at decent 3G+ speeds just before the blip.


This is exactly what CouchDB, and Mobile Couchbase for Android and iOS, is designed to fix.

Networks are slow. Mobile networks are slower. The most robust fix to the problem is to "optimistically replicate" your application data to the end user's device, so that the network latency does not become part of the user experience.

This is a strong fit for applications like CRM or geographically constrained apps, as the data sets are small enough to fit completely on your devices. For larger data sets the issue becomes: which subset of the data should be copied to the device ahead of time.

The user should never needs to wait on the network. All data operations are played against the local Couch, which handles asynchronously transmitting changes to and from the remote server, in the background. This pattern makes it much easier for app developers to make responsive applications, where users are never left waiting on multi-second round trip times.


Here's animation of the packets of a Facebook page hit over 3G on a moving bus: http://vimeo.com/17248120


Very interesting and sheds some light on the weird latency issues I see here in Rwanda, where 3G issues aren't limited to being on moving trains. Sometimes pinging shows crazy return times of 30000-60000ms. Other times they're only 200-400ms but every other ping packet times out, i.e. one packet through, next one drops, and so on. Still trying to figure out exactly what's happening then.


Bufferbloat means when you refuse to drop packets, you get longer latencies: http://www.bufferbloat.net/

Why you'd have big buffers sometimes and not others, I have no idea.


It looks like this should be (at least partially) dealt with at the OS level especially if the OS in question is a mobile one.


I'm in a start up that's developed techniques using UDP to allow someone to roam across wifi - in other words we have managed to reduce the tcp delays and time outs to provide consistent and reliable handoffs between wifi zones and devices - regardless what of the network provider. The technology also allows hot handover between femto and wifi too. Right now we're mostly focused on a mobile app to improve broadband delivery of content to mobile users in shopping malls, commercial zones, etc. but that's low hanging fruit. Eventually we believe that this can be integrated in mobile apps to let others us this for true mobility while running broadband services.


The much worse problem is DNS. For big networks that pushes always the same two IPs (even without round-robin) it is a disaster. There are lags of servers, lags of network, dropped packets, useless overhead with EDNS and different packet sizes (timeouts and retransmitions) and above all, the practice by content providers and CDNs to use hundreds of changing in real time hostnames to implement load balancing and/or geoIP based assets loading. They use near zero TTLs which makes caching useless and dynamic sets.

Indian Airtel's network is a live example of that disaster. It is almost unusable, while they still actively promoting 3G and iPhones. ^_^


Interesting, suggests a quick fix might be for the client to not use keepalive, or to selectively close connections that are very slow so as to start new ones. Potentially a much easier solution than writing a new transport.


Don't worry, our Lisp startup (www.teclo.net) is fixing TCP over mobile networks, it will all be fine soon enough. :-)


perhaps you could explain a bit more about how you're solving the issue?


As noted elsewhere in this thread, the properties of wireless networks conflict badly with the design principles of TCP. We have a custom TCP stack that replaces some of the standard TCP algorithms with new ones carefully tuned for mobile broadband. It can act as a transparent proxy, presenting a standard stack towards the internet and a optimized one towards the radio network. The packet stream is still standard TCP though, so no client changes are needed.

The product is deployed in some >10Gbps networks, and shows very impressive gains for real live traffic.


thanks for the explanation :) does your layer only sit between the mobile and the point at which the network becomes a wired connection? like, just the other side of the cell towers?


No, it'd be somewhere in the core network. Ideally next to the GGSN.


With Lisp :)


ah yes, the answer to all problems. i for one welcome our new lisp overlords.


Interesting. I cannot stop thinking how cool it would be if Google actually decides to step in and propose an alternative protocol for mobile networks. If they put it in Android, they already have a huge base for adoption.

Ended up writing a piece on Google because of this on my blog: http://micheljansen.org/blog/entry/1060

(shameless plug :P)


As an historical note, most of these concerns are the same expressed in RFC 3481 (category: BCP). You'll note from there that some of the issues are still open even if almost a decade has passed.


Would Vint Cerf's recent work on a high-latency network standard for space[0] apply? Would it make mobile more useful? It's designed for latencies of days (not seconds), so it might be overkill. But something to masticate upon...

0. http://www.technologyreview.com/communications/21601/?a=f


This is partly why I'm so interested in publishing information at the DNS level (i.e. .tel) - you get to use UDP (or TCP failover), plus other awesome benefits. You can do other innovative things with DNS too.


Of course, that does depend on DNS having a protocol to run over. What we really need is a DWIW protocol suite at each layer.


Yep, but the point I was trying to make is that DNS over UDP works well today.

It's also worth remembering that most of the world has 2G connectivity. Heck, even I'm 2G most of the time (rural UK).


Dave Taht points out http://www.bufferbloat.net/ which looks very interesting!


Mobile HTML5 apps, the future is the past. :) Check ours out http://infostripe.net


We should probably get these long round-trip protocol issues ironed out before we build our galactic internet


TCP is already out as a protocol for our galactic internet, see relevant comment from Linux TCP implementation:

    /*
     * [...] Note that 120 sec is defined in the protocol as the maximum
     * possible RTT.  I guess we'll have to use something other than TCP
     * to talk to the University of Mars.
     * PAWS allows us longer timeouts and large windows, so once implemented
     * ftp to mars will work nicely.
     */


(This is from /usr/src/linux/net/inet/tcp.c)

This is correct. At their furthest apart, Mars and Earth are 22 light minutes apart. Their closest is almost exactly 3 light minutes. For comparison, the Earth and Moon are just over 1 light second apart.

On a more serious note, could tcp.c just be patched on the client to drop packets after ~10sec?


I sincerely hope ftp is a distant memory by the time the University of Mars opens.


well, why not? Martian students will need to use something to transfer files. Perhaps SFTP?

The International Space Station still uses Kermit! At least they did in 2003:

http://www.spacedaily.com/news/iss-03zq.html


Unsurprisingly, this is a well-known problem: http://ipnpr.jpl.nasa.gov/index.cfm

While I don't know that we can fairly say the Internet qua Internet has been extended past Earth (and associated environs), NASA certainly uses a Solar System-scale network already, and while they haven't made a big deal about some of the routing they've already done, if you read the press releases carefully they'll sometimes mention how they routed the signal from one probe through another. It's already a network.


I would encourage you all to look at the site dtnrg.org It's for the DTN research group, which among other things is beginning to architect and implement the so-called Interplanetary Internet. Some early trials have been run among NASA, ESA, and JAXA mission-control centers, as well as one flight test on EPOXI (used to be the Deep Impact mission)


Is there any way round this for HTML5 apps? I know you can save an app icon on iphone but when you launch it just launches safari which seems to make a network request to check if the site is up to date? (Sorry, I'm a bit naive about all this HTML5 stuff.) e.g. Gmail in safari on iphone is useless when you get long latency situations like this. Is there a way round that?


anyone know what tool I could use to run a similar test?


SCTP anyone?


Not that his advice is bad, but these statistics are a bit biased. Trains make for some pretty unusually difficult channel conditions.


Really? I would say trains, buses and cars are really common use cases for mobile data. Every second person I see is watching Youtube or playing around on the internet while they take public transport. It isn't biased data, there is an actual use case that it is presently failing.


These are typical results. I make networking equipment that has to deal with wireless connections. Our tests are as ideal as you can get without driving around for a tower and sitting under it (although we have done that, too). We're typically in an office, stationary. This means no tower hand-offs, and stable (relatively) conditions. Round-trip-times in the 20 sec range are not rare. I've even seen RTTs a minute long several times, and once saw even greater than 90sec.

What is really interesting is to watch the RTT while you run a speed test.


That may be, but - at least in my case - the most typical mobile usage is on a train (everywhere else I usually have WiFi).


Trains moreso than cars/buses? Why?


Faster travel meaning you are passing in and out of cells more often, and you are more likely to be passing through areas with bad coverage either for man-made geographical reasons (a steep embankment between you and most of the towers in range for instance) or because your route is more likely to take a straight-ish line that may pass through a pretty uninhabited area that is ether completely unserviced by the cell network or where the only tower(s) in range are some distance away. Also if you are travelling by bus (and to a certain extent by car) you are more likely travelling a shorter distance than if travelling by train so are less likely to be passing through areas that are badly served because there is no money in the networks serving the two local farmers as well as they serve a small town.


Trains also dampen the signal. Many German high speed trains have repeaters but those don't yet (hopefully!) work with UMTS. Cars and busses have admittedly the same problem but maybe to a lesser extent.


Cross Country trains (what was Virgin's bit of the rail franchises) are worse for this than anything else on the UK rail network. I'm told it is due to a combination of the materials in the construction of the carriage frame, and the anti-glare layer in the windows. If you are at the end of a carriage, signal strength/reliability seems to jump a bit when you are sat at a station and the doors are open.

Other trains don't seem nearly as bad. I'm not sure how much of that is due to different construction or (in the case of other modern(ish) rolling stock) anything like the repeaters you mention (though as all the franchise owners are cheap-arses I very much doubt that tech has been paid for by any of them!).


the area of each cell typically increases when you move out to sparsely populated areas with vast expanses of land. thereby, minimizing number of handovers (cell-to-cell) that might be happening.


Aye, but that is what creates the "distance from tower" problem of getting poor signal quality due to normal attenuation over distance and the larger number potential shadow/interference causing things between the tower and you.

There are a number of places where trains skip around at a goodly speed in what is probably a packed area by way of phone cells, so there will sometimes be a significant number of cell-to-cell hand-offs. Having said that, as I grew up (well, more-or-less) before mobile phones were common I'm still slightly impressed that the whole cell hand-over thing works at all mid-call at 80+mph so maybe they are not much of an issue unless the destination cell is already saturated at the time.


I have precisely the same characteristics in my (stationary) garden.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: