We're using Varnish and ESI to embed sub-documents into JSON documents. Basically a response from our app-server looks like this:
[
<esi:include src="/station/best_of_80s" />,
<esi:include src="/station/herrmerktradio" />,
<esi:include src="/station/bluesclub" />,
<esi:include src="/station/jazzloft" />,
<esi:include src="/station/jahfari" />,
<esi:include src="/station/maximix" />,
<esi:include src="/station/ondalatina" />,
<esi:include src="/station/deepgroove" />,
<esi:include src="/station/germanyfm" />,
<esi:include src="/station/alternativeworld" />
]
The included resources are complete and valid JSON responses on their own. The complete list of all stations is about 1070. So when the cache is cold and a complete station list is the first request varnish issues 1000 requests on our backend. When the cache is hot ab looks like this:
$ ab -c 100 -n 1000 http://127.0.0.1/stations
[...]
Document Path: /stations
Document Length: 2207910 bytes
Concurrency Level: 100
Time taken for tests: 10.075 seconds
Complete requests: 1000
Failed requests: 0
Write errors: 0
Total transferred: 2208412000 bytes
HTML transferred: 2207910000 bytes
Requests per second: 99.26 [#/sec] (mean)
Time per request: 1007.470 [ms] (mean)
Time per request: 10.075 [ms] (mean, across all concurrent requests)
Transfer rate: 214066.18 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 1 11 7.3 9 37
Processing: 466 971 97.4 951 1226
Waiting: 0 20 16.6 12 86
Total: 471 982 98.0 960 1230
Percentage of the requests served within a certain time (ms)
50% 960
66% 985
75% 986
80% 988
90% 1141
95% 1163
98% 1221
99% 1229
100% 1230 (longest request)
$
100 rec/sec doesn't look that good but consider the size of the document. 214066Kbytes/sec oversaturates a 1Gbit interface well.
A single request with warm cache ab (ab -c 1 -n 1 ...) shows 83ms/req.
The backend itself is redis based. We're measuring a mean response time of 0.9ms [sic] in NewRelic. After restarting Varnish the first request with a cold cache (ab -c 1 -n 1 ...) shows 3158ms/rec. This means it takes Varnish and our backend about 3ms per ESI include when generating the response. This is a standard core i7 pizza box with 8 cores. I measured this while being under full load. We're serving about 150mio req/month this way with a hitrate of 0.9. These numbers suggest indeed that the ESI-includes are resolved in serial.
What you have to consider when designing a system like this is 1) that your backend is able to take the load after a Varnish restart when the cache is cold and 2) that usually your resources don't expire all at once. In case of our stations they expire every full hour but we're adding a random value of up to 120 seconds to the expiration header.
Hope that helps.