Improving the performance of rows.Scan() in Go
Asked Answered
S

1

6

I have a very simple query that returns a couple thousand rows with only two columns:

SELECT "id", "value" FROM "table" LIMIT 10000;

After issuing sql.Query(), I traverse the result set with the following code:

data := map[uint8]string{}

for rows.Next() {
    var (
        id     uint8
        value  string
    )

    if error := rows.Scan(&id, &value); error == nil {
        data[id] = value
    }
}

If I run the exact same query directly on the database, I get all results back within a couple of milliseconds, but the Go code takes far longer complete, sometimes almost 10 seconds!

I started commenting out several parts of the code and it seems that rows.Scan() is the culprit.

Scan copies the columns in the current row into the values pointed at by dest.

If an argument has type *[]byte, Scan saves in that argument a copy of the corresponding data. The copy is owned by the caller and can be modified and held indefinitely. The copy can be avoided by using an argument of type *RawBytes instead; see the documentation for RawBytes for restrictions on its use. If an argument has type *interface{}, Scan copies the value provided by the underlying driver without conversion. If the value is of type []byte, a copy is made and the caller owns the result.

Can any expect any speed improvement if I use *[]byte, *RawBytes or *interface{} instead?

Looking at the code, it looks like the convertAssign() function is doing a lot of stuff that isn't necessary for this particular query. So my question is: how can I make the Scan process faster?

I thought about overloading the function to expect predetermined types, but that isn't possible in Go...

Any ideas?

Stuyvesant answered 31/3, 2014 at 22:26 Comment(6)
What happened when you tried *[]byte, *RawBytes and *interface{}?Cowbind
@peterSO: I was reading the documentation and data from *RawBytes seems to go away whenever you call rows.Next(). I haven't tried the other two, I was merely asking if it would help with anything. If you look at the convertAssign source code (linked in the answer), the uint8 type still requires going thru reflection I think.Stuyvesant
Did you try using the profiler to help narrow it down?Transvalue
@DanielWilliams: No, I'm quite new to Go, could you link me to a tutorial or documentation that explains how to use the profiler?Stuyvesant
Here's a good post about it blog.golang.org/profiling-go-programsTransvalue
Can you try other drives and check the time?Bode
C
4

yes, you can use RawBytes instead and rows.Scan() will avoid memory allocation/copying

About convertAssign() function - yes, its not optimal in Go 1.2, but they make significant improvements in 1.3:
- http://code.google.com/p/go/issues/detail?id=7086
- Lock-less implementation for sync.Pool

I have some example of RawBytes usage - https://gist.github.com/yvasiyarov/9911956

This code read data from MySQL table, make some processing and write it to CSV files. Last night it takes 1 minute 24 seconds to generate 4GB of CSV data(about 30 million rows)

so I'm pretty sure what problem is outside of go code: even worse possible usage of rows.Scan() can not give you 10 seconds delay.

Carnivorous answered 1/4, 2014 at 11:35 Comment(1)
The reason was a cold database and a busy disk, thanks for answering.Stuyvesant

© 2022 - 2024 — McMap. All rights reserved.