To track down the problem, I first measured the maximum number of concurrent connections, and discovered that this was consistently around 55.
reduced the maximum to 10 (by setting the operation queue max value, less timeouts, but still not perfect
reduced the maximum to 4 - even less timeouts but still got some
set the maximum to 1 - this has got to work, right? Nope!
Sometimes the app would work properly with no concurrency at all (max == 1), but it still failed about half the time.
So I burned a support incident with Apple and got one of there most experienced engineers to advise. Based on suggestions from him I tried the following:
ran the app on a different carrier's cellular network (Verizon), and it worked perfectly. So the issue was not cellular per se (or iOS) but AT&T's cellular network (NY and NJ both fail)
switched from http to https, now it works flawless on AT&T with full concurrency
investigated the web endpoint to determine its capabilities, which turned out to be pretty poor (more later). I tried a different web endpoint with more capabilities, and the original problem went away using http.
What I learned from this was as follows:
there was no difference using iOS5.1 or iOS6
if you have this problem on AT&T 3G and are using http, try switching to https
if the endpoint is using HTTP1.0 and does NOT support 'Connection: keep-alive', then every http request is setting up and tearing down a TCP connection. I believe this 'thrashing' of the cellular network is why AT&T was disconnecting some of my sessions, but of course have no way to know this for sure.
using a HTTP1.1 service that supports persistent connections, the problem disappeared. In this case there is no TCP connection thrashing.
some HTTP1.1 services support 'pipelining', as does iOS (using the NSURLRequest setting, HTTPShouldUsePipelining), and if I can switch to this then my performance should greatly improve
there is a WWDC 2012 video that discusses how to improve network performance: Session 706 "Networking Best Practices"
EDIT
So this just gets more bizarre as I peel the onion! After further discussion, some web people did a test with CloudFront, and it did accept 'Connection: keep-alive'. I tried over and over to get it to work yesterday, but could not.
The web expert suggest I try it when using https and low and behold it did! For some reason, when using 'http' over AT&T 3G, that header tag is either removed or ignored. I tested my app with Wifi too. In all cases but AT&T/3G the 'Connection' was returned in the response.