How to pipeline in node.js to redis?

Asked 28/1, 2014 at 20:49 Answered 31/5, 2020 at 19:40

Solved node.js redis pipeline node-redis

I have lot's of data to insert (SET \ INCR) to redis DB, so I'm looking for pipeline \ mass insertion through node.js.

I couldn't find any good example/ API for doing so in node.js, so any help would be great!

Accouterment answered 28/1, 2014 at 20:49 Comment(0)

Yes, I must agree that there is lack of examples for that but I managed to create the stream on which I sent several insert commands in batch.

You should install module for redis stream:

npm install redis-stream

And this is how you use the stream:

var redis = require('redis-stream'),
    client = new redis(6379, '127.0.0.1');

// Open stream
var stream = client.stream();

// Example of setting 10000 records
for(var record = 0; record < 10000; record++) {

    // Command is an array of arguments:
    var command = ['set', 'key' + record, 'value'];  

    // Send command to stream, but parse it before
    stream.redis.write( redis.parse(command) );
}

// Create event when stream is closed
stream.on('close', function () {
    console.log('Completed!');

    // Here you can create stream for reading results or similar
});

// Close the stream after batch insert
stream.end();

Also, you can create as many streams as you want and open/close them as you want at any time.

There are several examples of using redis stream in node.js on redis-stream node module

Shuffleboard answered 7/2, 2014 at 16:25 Comment(4)

Thanks Toni! do you know if and how it work with Lua scripts? – Accouterment 7/2, 2014 at 18:28

Hmm I haven't tried but I think you could load the scripts in redis instance and run them using eval or evalsha commands sent through pipeline. – Shuffleboard 11/2, 2014 at 13:16

I ran your code verbatim and none of the keys were set. Calling "keys *" through redis-cli afterwards yields an empty set. – Marielamariele 22/3, 2015 at 20:37

@Marielamariele you are right. Thanks for report. This was outdated. I checked now new version of redis-stream and commands changed, well, only one in this example: Instead of: stream.write( ... ) In new version it is: stream.redis.write( ... ) – Shuffleboard 25/3, 2015 at 20:30

In node_redis there all commands are pipelined:

https://github.com/mranney/node_redis/issues/539#issuecomment-32203325

Owl answered 10/2, 2014 at 15:26 Comment(1)

Thanks! I really tried using multi and eval & exec but the performance were even worse... – Accouterment 11/2, 2014 at 10:22

You might want to look at batch() too. The reason why it'd be slower with multi() is because it's transactional. If something failed, nothing would be executed. That may be what you want, but you do have a choice for speed here.

The redis-stream package doesn't seem to make use of Redis' mass insert functionality so it's also slower than the mass insert Redis' site goes on to talk about with redis-cli.

Another idea would be to use redis-cli and give it a file to stream from, which this NPM package does: https://github.com/almeida/redis-mass

Not keen on writing to a file on disk first? This repo: https://github.com/eugeneiiim/node-redis-pipe/blob/master/example.js

...also streams to Redis, but without writing to file. It streams to a spawned process and flushes the buffer every so often.

On Redis' site under mass insert (http://redis.io/topics/mass-insert) you can see a little Ruby example. The repo above basically ported that to Node.js and then streamed it directly to that redis-cli process that was spawned.

So in Node.js, we have:

var redisPipe = spawn('redis-cli', ['--pipe']);

spawn() returns a reference to a child process that you can pipe to with stdin. For example: redisPipe.stdin.write().

You can just keep writing to a buffer, streaming that to the child process, and then clearing it every so often. This then won't fill it up and will therefore be a bit better on memory than perhaps the node_redis package (that literally says in its docs that data is held in memory) though I haven't looked into it that deeply so I don't know what the memory footprint ends up being. It could be doing the same thing.

Of course keep in mind that if something goes wrong, it all fails. That's what tools like fluentd were created for (and that's yet another option: http://www.fluentd.org/plugins/all - it has several Redis plugins)...But again, it means you're backing data on disk somewhere to some degree. I've personally used Embulk to do this too (which required a file on disk), but it did not support mass inserts, so it was slow. It took nearly 2 hours for 30,000 records.

One benefit to a streaming approach (not backed by disk) is if you're doing a huge insert from another data source. Assuming that data source returns a lot of data and your server doesn't have the hard disk space to support all of it - you can stream it instead. Again, you risk failures.

I find myself in this position as I'm building a Docker image that will run on a server with not enough disk space to accommodate large data sets. Of course it's a lot easier if you can fit everything on the server's hard disk...But if you can't, streaming to redis-cli may be your only option.

If you are really pushing a lot of data around on a regular basis, I would probably recommend fluentd to be honest. It comes with many great features for ensuring your data makes it to where it's going and if something fails, it can resume.

One problem with all of these Node.js approaches is that if something fails, you either lose it all or have to insert it all over again.

Casas answered 5/5, 2016 at 23:43 Comment(0)

By default, node_redis, the Node.js library sends commands in pipelines and automatically chooses how many commands will go into each pipeline [(https://github.com/NodeRedis/node-redis/issues/539#issuecomment-32203325)][1]. Therefore, you don't need to worry about this. However, other Redis clients may not use pipelines by default; you will need to check out the client documentation to see how to take advantage of pipelines.

Scheme answered 31/5, 2020 at 19:40 Comment(0)

Recommended topics

Hot tags