nodeJS inserting Data into PostgreSQL error
Asked Answered
B

2

1

I have a weird error using NodeJS with a PostgreSQL and I hope you can maybe help me out.

I have a huge amount of data sets, about 2 Million entries that I want to insert into my DB.

One data consists of 4 columns:

id: string,
points: float[][]
mid: float[]
occurences: json[]

I am inserting data like so:

let pgp = require('pg-promise')(options);
let connectionString = 'postgres://archiv:archiv@localhost:5432/fotoarchivDB';
let db = pgp(connectionString);

cityNet.forEach((arr) => {
    db
    .none(
        "INSERT INTO currentcitynet(id,points,mid,occurences) VALUES $1",
        Inserts("${id},${points}::double precision[],${mid}::double precision[],${occurences}::json[]",arr))
    .then(data => {
        //success
    })
    .catch(error => {
        console.log(error);
        //error
    });
})

function Inserts(template, data) {
    if (!(this instanceof Inserts)) {
        return new Inserts(template, data);
    }
    this._rawDBType = true;
    this.formatDBType = function() {
    return data.map(d => "(" + pgp.as.format(template, d) + ")").join(",");
};

This works out for exactly for the first 309248 data pieces, then suddenly it just errors out with the following for (what it seems like) every next data it tries to insert:

{ error: syntax error at end of input
at Connection.parseE (/home/christian/Masterarbeit_reworked/projekt/server/node_modules/pg-promise/node_modules/pg/lib/connection.js:539:11)
at Connection.parseMessage (/home/christian/Masterarbeit_reworked/projekt/server/node_modules/pg-promise/node_modules/pg/lib/connection.js:366:17)
at Socket.<anonymous> (/home/christian/Masterarbeit_reworked/projekt/server/node_modules/pg-promise/node_modules/pg/lib/connection.js:105:22)
at emitOne (events.js:96:13)
at Socket.emit (events.js:188:7)
at readableAddChunk (_stream_readable.js:176:18)
at Socket.Readable.push (_stream_readable.js:134:10)
at TCP.onread (net.js:548:20)
name: 'error',
length: 88,
severity: 'ERROR',
code: '42601',
detail: undefined,
hint: undefined,
position: '326824',
internalPosition: undefined,
internalQuery: undefined,
where: undefined,
schema: undefined,
table: undefined,
column: undefined,
dataType: undefined,
constraint: undefined,
file: 'scan.l',
line: '1074',
routine: 'scanner_yyerror' }

The 'position' entry changes for every iterating error-message.

I can redo that and it will always error after 309248 entries. When I try to insert less, like 1000 entries, the error does not occur.

That really confuses me. I thought PostgreSQL does not have any max amount of rows. Also the error message does not help me at all.

SOLVED The error was found. In my data there were "null" entries that have slipped into it. Filtering out null-data worked out. I will try out the other recommendations for inserting data, since the current way works, but the performance is very crappy.

Bihari answered 27/6, 2017 at 15:7 Comment(1)
Are you trying to insert 2 million records at once? The Node.js wouldn't be able to handle that. The inserts have to be paged properly. Also, do not use that old approach to generating inserts, use the one as shown here: #37301497Civility
P
1

I'm not sure, but it looks like you got wrong data structure at the last element(309249) and PostgreSQL cannot parse some property

Pietrek answered 27/6, 2017 at 15:16 Comment(2)
tahts seems correct. somehow the last element became 'null' i will find out why a null gets added, remove it and try againBihari
That's great. WelcomePietrek
C
2

I'm the author of pg-promise. Your whole approach should be changed to the one below.

Proper way to do massive inserts via pg-promise:

const pgp = require('pg-promise')({
    capSQL: true
});

const db = pgp(/*connection details*/);

var cs = new pgp.helpers.ColumnSet([
    'id',
    {name: 'points', cast: 'double precision[]'},
    {name: 'mid', cast: 'double precision[]'},
    {name: 'occurences', cast: 'json[]'}
], {table: 'currentcitynet'});

function getNextInsertBatch(index) {
    // retrieves the next data batch, according to the index, and returns it
    // as an array of objects. A normal batch size: 1000 - 10,000 objects,
    // depending on the size of the objects.
    //
    // returns null when there is no more data left.
}

db.tx('massive-insert', t => {
    return t.sequence(index => {
        const data = getNextInsertBatch(index);
        if (data) {
            const inserts = pgp.helpers.insert(data, cs);
            return t.none(inserts);
        }
    });
})
    .then(data => {
        console.log('Total batches:', data.total, ', Duration:', data.duration);
    })
    .catch(error => {
        console.log(error);
    });

UPDATE

And if getNextInsertBatch can only get the data asynchronously, then return a promise from it, and update the sequence->source callback accordingly:

return t.sequence(index => {
    return getNextInsertBatch(index)
        .then(data => {
            if (data) {
                const inserts = pgp.helpers.insert(data, cs);
                return t.none(inserts);
            }
        });
});

Related Links:

Civility answered 27/6, 2017 at 16:3 Comment(2)
@Christian with the right approach described here it would also be easier to diagnose and amend, because type ColumnSet will alarm you when it encounters a null. Plus the performance is significantly better.Civility
@Christian how did it go? ;)Civility
P
1

I'm not sure, but it looks like you got wrong data structure at the last element(309249) and PostgreSQL cannot parse some property

Pietrek answered 27/6, 2017 at 15:16 Comment(2)
tahts seems correct. somehow the last element became 'null' i will find out why a null gets added, remove it and try againBihari
That's great. WelcomePietrek

© 2022 - 2024 — McMap. All rights reserved.