Dial tcp I/O timeout on simultaneous requests
Asked Answered
C

1

10

I am building a tool in Go that needs to make a very large number of simultaneous HTTP requests to many different servers. My initial prototype in Python had no problem doing a few hundred simultaneous requests.

However, I have found that in Go this almost always results in a Get http://www.google.com: dial tcp 216.58.205.228:80: i/o timeout for some if the number of simultaneous requests exceeds ~30-40.

I've tested on macOS, openSUSE, different hardware, in different networks and with different domain lists, and changing the DNS server as described in other Stackoverflow answers does not work either.

The interesting thing is that the failed requests do not even produce a packet, as can be seen when checking with Wireshark.

Is there anything that I am doing wrong or is that a bug in Go?

Minimum reproducible program below:

package main

import (
    "fmt"
    "net/http"
    "sync"
)

func main() {
    domains := []string{/* large domain list here, eg from https://moz.com/top500 */}

    limiter := make(chan string, 50) // Limits simultaneous requests

    wg := sync.WaitGroup{} // Needed to not prematurely exit before all requests have been finished

    for i, domain := range domains {
        wg.Add(1)
        limiter <- domain

        go func(i int, domain string) {
            defer func() { <-limiter }()
            defer wg.Done()

            resp, err := http.Get("http://"+domain)
            if err != nil {
                fmt.Printf("%d %s failed: %s\n", i, domain, err)
                return
            }

            fmt.Printf("%d %s: %s\n", i, domain, resp.Status)
        }(i, domain)
    }

    wg.Wait()
}

Two particular error messages are happening, a net.DNSError that does not make any sense and a non-descript poll.TimeoutError:

&url.Error{Op:"Get", URL:"http://harvard.edu", Err:(*net.OpError)(0xc00022a460)}
&net.OpError{Op:"dial", Net:"tcp", Source:net.Addr(nil), Addr:net.Addr(nil), Err:(*net.DNSError)(0xc000aca200)}
&net.DNSError{Err:"no such host", Name:"harvard.edu", Server:"", IsTimeout:false, IsTemporary:false}

&url.Error{Op:"Get", URL:"http://latimes.com", Err:(*net.OpError)(0xc000d92730)}
&net.OpError{Op:"dial", Net:"tcp", Source:net.Addr(nil), Addr:net.Addr(nil), Err:(*poll.TimeoutError)(0x14779a0)}
&poll.TimeoutError{}

Update:

Running the requests with a seperate http.Client as well as http.Transport and net.Dialer does not make any difference as can be seen when running code from this playground.

Calcaneus answered 25/8, 2018 at 12:3 Comment(14)
You are making all requests with http.DefaultClient. What happens when you distribute the requests over a few independent http clients? Perhaps the connection pool is limited to some number of connections.Holy
reworked your code (play.golang.org/p/HnKdFG5roj-) and yes i also find some results rather suspicious. Not sure why it would not resolve web.mit.edu / fda.gov / geocities.jp / clickbank.net. However imho it is not related to concurrency rate.Jailbreak
Also found this along the road, 2018/08/25 17:24:53 Unsolicited response received on idle HTTP channel starting with "HTTP/1.0 408 Request Time-out\r\nServer: AkamaiGHost\r\nMime-Version: 1.0\r\nDate: Sat, 25 Aug 2018 15:24:53 GMT\r\nContent-Type: text/html\r\nContent-Length: 218\r\nExpires: Sat, 25 Aug 2018 15:24:53 GMT\r\n\r\n<HTML><HEAD>\n<TITLE>Request Timeout</TITLE>\n</HEAD><BODY>\n<H1>Request Timeout</H1>\nThe server timed out while waiting for the browser's request.<P>\nReference&#32;&#35;2&#46;3ff90a17&#46;1535210693&#46;0\n</BODY></HTML>\n"; err=<nil>Jailbreak
@Holy see the update, it does not make a differenceCalcaneus
@mh-cbon have you tried to lower the concurrency, because with ~5-10 concurrent requests its running without problemsCalcaneus
yes, it is very similar to my previous tests, 40 failures or so. Still some i dont quiet because dig resolves them. Even googleusercontent.com constantly fails. See also github.com/golang/go/issues/18588. I ran it on 1.10, i have not took time to switch to 1.11 yet, might worth the test.Jailbreak
@mh-cbon that issue seems pretty much like what is happening here, thank youCalcaneus
@Calcaneus Hey did you solve your problem?Rudder
@Rudder no I did not, I have reduced the amount of parallel requests and chose to work with multiple instances of the same program, which seems to point to the open file limit that is mentioned in the issue mh-cbon mentioned and is not yet resolved from a go standard library standpoint.Calcaneus
@Neverbolt, there is a good chance that the DNS server is causing your bottleneck. Google explicitly states it will alter the queries per second per client if it thinks something odd is going on. I cannot imagine it is the only DNS provider that has this defensive measure built in. A way to test this is overriding the default DNS Resolver to use a cache like (here)[https://mcmap.net/q/726294/-does-go-cache-dns-lookups].Crinum
@LiamKelly As I said in the question, taking a python client to do the very same thing did not result in any performance issues, so I don't think that the DNS server is the bottleneck, as both were using the same server.Calcaneus
@Calcaneus there is a good chance that the python code is just slower given the GIL. Surprised that there is not a DNS tool to measure QPS. Seem pretty straight forward to do in gopacket but probably even more useful to implement via ebpf.Crinum
Do not think that is a bug in go... have you seen github.com/codesenberg/bombardier ?Fishman
@Fishman the tool you linked has nothing to do with resolving large numbers of domain names, so I don't think it applies hereCalcaneus
F
1

I think many of your net.DNSErrors are actually too many open files errors in disguise. You can see this by running your sample code with the netgo tag (recommendation from here) (go run -tags netgo main.go) which will emit errors like:

…dial tcp: lookup buzzfeed.com on 192.168.1.1:53: dial udp 192.168.1.1:53: socket: too many open files

instead of

…dial tcp: lookup buzzfeed.com: no such host

Make sure you're closing the request's response body (resp.Body.Close()). You can find more about this specific problem at What's the best way to handle "too many open files"? and How to set ulimit -n from a golang program?. (On my machine (macOS), increasing file limits manually seemed to help, but I don't think it's a good solution since it doesn't really scale, and I'm not sure how many open files you'd need overall.)


As suggested by @liam-kelly, I think the i/o timeout error is coming from a DNS server or some other security mechanism. Setting a custom (bad) DNS server IP gives me the same error.

Forest answered 30/5, 2021 at 13:48 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.