Golang slow scan() for multiple rows
Asked Answered
M

2

6

I am running a query in Golang where I select multiple rows from my Postgresql Database.

I am using the following imports for my query

"database/sql"
"github.com/lib/pq"

I have narrowed down to my loop for scanning the results into my struct.

// Returns about 400 rows
rows, err = db.Query('SELECT * FROM infrastructure')
if err != nil {
    return nil, err
}

var arrOfInfra []model.Infrastructure
for rows.Next() {
    obj, ptrs := model.InfrastructureInit()
    rows.Scan(ptrs...)
    arrOfInfra = append(arrOfInfra, *obj)
}
rows.Close()

The above code takes about 8 seconds to run, and while the query is fast, the loop in rows.Next() takes the entire 8 seconds over to complete.

Any ideas? Am I doing something wrong, or is there a better way?

My configuration for my database

// host, port, dbname, user, password masked for obvious reasons
db, err := sql.Open("postgres", "host=... port=... dbname=... user=... password=... sslmode=require")
if err != nil {
    panic(err)
}

// I have tried using the default, or setting to high number (100), but it doesn't seem to help with my situation
db.SetMaxIdleConns(1)
db.SetMaxOpenConns(1)

UPDATE 1:

I placed print statements in the for loop. Below is my updated snippet

for rows.Next() {
    obj, ptrs := model.InfrastructureInit()
    rows.Scan(ptrs...)
    arrOfInfra = append(arrOfInfra, *obj)
    fmt.Println("Len: " + fmt.Sprint(len(arrOfInfra)))
    fmt.Println(obj)
}

I noticed that in this loop, it will actually pause half-way, and continue after a short break. It looks like this:

Len: 221
Len: 222
Len: 223
Len: 224
<a short pause about 1 second, then prints Len: 225 and continues>
Len: 226
Len: 227
...
..
.

and it will happen again later on at another row count, and again after a few hundred records.


UPDATE 2:

Below is a snippet of my InfrastructureInit() method

func InfrastructureInit() (*Infrastructure, []interface{}) {
    irf := new(Infrastructure)
    var ptrs []interface{}
    ptrs = append(ptrs,
        &irf.Base.ID,
        &irf.Base.CreatedAt,
        &irf.Base.UpdatedAt,
        &irf.ListingID,
        &irf.AddressID,
        &irf.Type,
        &irf.Name,
        &irf.Description,
        &irf.Details,
        &irf.TravellingFor,
    )
    return irf, ptrs
}

I am not exactly sure what is causing this slowness, but I currently placed a quick patch on my server to using a redis database and precache my infrastructures, saving it as a string. It seems to be okay for now, but I now have to maintain both redis and my postgres.

I am still puzzled over this weird behavior, but I'm not exactly how rows.Next() work - does it make a query to the database everytime I call rows.Next()?

Meprobamate answered 25/5, 2017 at 16:28 Comment(9)
model.InfrastructureInit() - what is this line doing? How many columns are you scanning in? Could also be from scanning in ~400 rows. Is it really necessary to pull back that many records?Pantheism
@gavin the init statement simply creates a new model, and create an array with pointers to the new object's variable. A convenient method for preparing to Scan(). Each record has 12 columns. And the number of records actually varies, but the average is 200-400 records.Meprobamate
If you want a little more performance verbosity, you could use the time pkg to see exactly which call is causing the delay. Given you see occasional delay as a slice grows, seems possibly it could be the resize of the underlying array -- see Append: An example but I wouldn't think it would be that slow.Brewton
If you want more help, you might want to post the code for the model.InfrastructureInit() function. Also, how many columns are in the infrastructure table? And how long does it take for you to run the query and get the output at the psql prompt?Brewton
Would running a goroutine in the for loop help? You could pass any variables outside the function's scope as parametres to the goroutine and ensure all of them have finished running before calling rows.Close() using a channel. Forgive me if I'm being naïveAntrim
@Brewton I have made an update to my post. I will also check out the link as well in due time. Thanks!Meprobamate
@Antrim a goroutine doesn't help in my case because this is actually part of an API call (:Meprobamate
@WeilsonWonder what is infrastructure and why are you creating pointers to it for every row, also notice how you create a pointer to obj and put that in your array while obj is a pointer to infrastructure.Corley
It's hard to tell without knowing your database schema. But if you have systematically have a single row slowing down your query, you may expect to have a row way larger than others. For a better answer to this question we need to have the schema of this table, you can also add an order by clause to your query to check if this is always the same row which slows down things.Chaeta
Y
0

How do you think about just do like this?

defer rows.Close()

var arrOfInfra []*Infrastructure
for rows.Next() {
    irf := &Infrastructure{}

    err = rows.Scan(
        &irf.Base.ID,
        &irf.Base.CreatedAt,
        &irf.Base.UpdatedAt,
        &irf.ListingID,
        &irf.AddressID,
        &irf.Type,
        &irf.Name,
        &irf.Description,
        &irf.Details,
        &irf.TravellingFor,
    ) 

    if err == nil {
        arrOfInfra = append(arrOfInfra, irf)
    }
}

Hope this help.

Yaw answered 20/6, 2018 at 8:54 Comment(0)
W
0

I went some weird path myself while consolidating my understanding of how rows.Next() work and what might be impacting performance so thought about sharing this here for posterity (despite the question asked a long time ago).

Related to:

I am still puzzled over this weird behavior, but I'm not exactly how rows.Next() work - does it make a query to the database everytime I call rows.Next()?

It doesn't make a 'query' but it reads (transfers) data from the db through a driver on each iteration which means it can be impacted by e.g. bad network performance. Especially true if, for example, your db is not local to where you are running your Go code. One approach to confirm whether network performance is an issue would be to run your go app on the same machine where your db is (if possible).

Assuming columns that are scanned in the above code are not of extremely large size or having custom conversions - reading ~400 rows should take in the order of 100ms at most (in a local setup).

For example - I had a case where I needed to read about 100k rows with about 300B per row and that was taking ~4s (local setup).

Wretch answered 13/4, 2021 at 6:16 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.