How to load 1 milion records from database fast?
Asked Answered
R

5

7

Now we have a firebird database with 1.000.000 that must be processed after ALL are loaded in RAM memory. To get all of those we must extract data using (select * first 1000 ...) for 8 hours. What is the solution for this?

Reverential answered 20/4, 2010 at 17:10 Comment(17)
what is the size of the table? are you sure your memory can hold that?Gerick
Sounds like a job for... The World's Fastest Database!Zagazig
A million records should be loaded in seconds unless these are really enormous records. How are you loading them? What platform are you using?Sherbet
The db has 450M , and yes a I have enough memory but not enough speed.Reverential
Is 34 records a second normal with Firebird? Are the records abnormally large/many fields? This seems extraordinarily slow to me.Mockheroic
Actually 1 million of rows is nothing for database to load...Gerick
What's SGBD? Have you been able to establish the cause of the bottleneck? What is the constraining factor? Disk speed, memory speed?Mockheroic
So, to load data from a table with 1.000.000 rows in C# using a firebird db takes on a Pentium 4 3Ghz at least 8 hours.Reverential
Define the phrase "loaded in RAM memory". Are you loading into some kind of collection? Your performance issue could be there instead of the read time from the database.Amortization
Post the entire code of your while loopAeropause
FbCommand cmd = new FbCommand("select field1, field2 from table a", connection); using (FbDataReader r = cmd.ExecuteReader()) { while (read.Read()) //this read make the transfer of datafrom db, I think that that here is the problem { int a = read.GetInt32(0); int b = read.GetInt32(1); mylist.Add(new DTO(a,b)); } It is a normal loop, nothing special about it. The transfer of data is the problem in my opinion.Reverential
Compare with this ibsurgeon.com/articles/item104Pitchstone
can you give the structure of database ?i will test it in my machinePitchstone
just create a table with 10 fields. Tomorrow I will put the stress test code here. CREATE TABLE tableA ( ID INTEGER DEFAULT 0 NOT NULL, ID1 INTEGER NOT NULL, FIELD1 TIMESTAMP NOT NULL, FIELD2 BIGINT, FIELD3 INTEGER, FIELD4 VARCHAR(32), FIELD5 INTEGER, FIELD6 BIGINT DEFAULT -1, FIELD7 INTEGER NOT NULL, FIELD8 INTEGER DEFAULT 0, FIELD9 SMALLINT DEFAULT 0 NOT NULL, FIELD0 SMALLINT default 0 NOT NULL, FIELD11 SMALLINT default 0 NOT NULL );Reverential
I think you're doing something wrong... when I see patterns like this, usually it means the programmer/architect doesn't fully understand how to use a database in their design.Essie
Need more info as indicated above.Segarra
have you solved the problem ?Pitchstone
S
4

Does each of your "select * first 1000" (as you described it) do a full table scan? Look at those queries, and make sure they are using an index.

Samhita answered 20/4, 2010 at 17:22 Comment(1)
Not the select speed is the problem. The transfer of data from select recordset to memory, the while (read.Read()) takes to long ...Reverential
P
1

to load data from a table with 1.000.000 rows in C# using a firebird db takes on a Pentium 4 3Ghz at least 8 hours

Everybody's been assuming you were running a SQL query to select the records from the database Something like

select * 
from your_big_table
/

Because that really would take a few seconds. Well, a little longer to display it on a screen, but executing the actual select should be lightning fast.

But that reference to C# makes me think you're doing something else. Perhaps what you really have is an RBAR loop instantiating one million objects. I can see how that might take a little longer. But even so, eight hours? Where does the time go?

edit

My guess was right and you are instantiating 1000000 objects in a loop. The correct advice would be to find some other way of doing whatever it is you do once you have got all your objects in memory. Without knowing more about the details it is hard to give specifics. But it seems unlikely this is a UI think - what user is going to peruse a million objects?

So a general observation will have to suffice: use bulk operations to implement bulk activity. SQL databases excel at handling sets. Leverage the power of SQL to process your million rows in a single set, rather than as individual rows.

If you don't find this answer helpful then you need to give us more details regarding want you're trying to achieve.

Photographer answered 20/4, 2010 at 17:39 Comment(2)
Our process is run once a month max in the worst case. And yes we load all data from out table in a big single loop. That loop is our bottleneck. For a moment I was thinking to make some clones of the DB and now to run some parallel queries to load data faster.Reverential
WEll you never load records in a loop when you have that many. That's database 101. I don;t know firebird but most database support using the select in the insert like insert table2 (field1, field2) select field1, field2 from table1 That would be much faster than one row at a timeBethink
L
1

How long does it take to construct the DTO object that you are creating with each data read?

{ int a = read.GetInt32(0); int b = read.GetInt32(1); mylist.Add(new DTO(a,b)); }

You are creating a million of these objects. If it takes 29 milliseconds to create one DTO object, then that is going to take over 8 hours to complete.

Litmus answered 20/4, 2010 at 19:30 Comment(0)
H
0

What sort of processing do you need to do that would require to load them in memory and not just process them via SQL statements?

There are two techniques I use that work depending on what I am trying to do.

  1. Assuming there is some sort of artificial key (identity), work in batches, incrementing the last identity value processed.

  2. BCP the data out to a text file, churn through the updates, then BCP it back in, remembering to turn off constraints and indexes before the IN step.

Hermes answered 20/4, 2010 at 17:15 Comment(1)
We make some analysis of every from db, and we need to have them loaded in memory. The bottleneck is the moment when we transfer data from record set to our DTO objects.Reverential
G
0

Take a look at this: http://www.firebirdfaq.org/faq13/

Gerick answered 20/4, 2010 at 17:21 Comment(1)
Nothing to help me at that link, maybe this problem is encountered on all databases.I need to optimize the time for "while (read.Read())".Reverential

© 2022 - 2024 — McMap. All rights reserved.