Is phpredis pipeline the same as using the protocol for mass insertion?
Asked Answered
K

1

6

I'm moving some part of my site from relational database to Redis and need to insert milions of keys in possibly short time.

In my case, data must be first fetched from MySQL, prepared by PHP and then added to corresponding sorted sets (time as a score + ID as a value). Currently I'm taking adventage of phpredis multi method with Redis::PIPELINE parameter. Despite noticeable speed improvements it turned out to block reads and slow down loading times while doing import.

So here comes the question - is using pipeline in phpredis an equivalent to the mass insertion described in http://redis.io/topics/mass-insert?


Here's an example:

  • phpredis way:

    <?php
    
    // All necessary requires etc.
    $client = Redis::getClient();    
    
    $client->multi(Redis::PIPELINE); // OR $client->pipeline();
    $client->zAdd('key', 1, 2);
    ...
    $client->zAdd('key', 1000, 2000);
    $client->exec();
    
  • vs protocol from redis.io:

    cat data.txt | redis-cli --pipe
    
Kwakiutl answered 23/9, 2013 at 11:37 Comment(0)
T
10

I'm one of the contributors to phpredis, so I can answer your question. The short answer is that it is not the same but I'll provide a bit more detail.

What happens when you put phpredis into Redis::PIPELINE mode is that instead of sending the command when it is called, it puts it into a list of "to be sent" commands. Then, once you call exec(), one big command buffer is created with all of the commands and sent to Redis.

After the commands are all sent, phpredis reads each reply and packages the results as per each commands specification (e.g. HMGET calls come back as associative arrays, etc).


The performance on pipelining in phpredis is actually quite good, and should suffice for almost every use case. That being said, you are still processing every command through PHP, which means you will pay the function call overhead by calling the phpredis extension itself for every command. In addition, phpredis will spend time processing and formatting each reply.

If your use case requires importing MASSIVE amounts of data into Redis, especially if you don't need to process each reply (but instead just want to know that all commands were processed), then the mass-import method is the way to go.

I've actually created a project to do this here: https://github.com/michael-grunder/redismi

The idea behind this extension is that you call it with your commands and then save the buffer to disk, which will be in the raw Redis protocol and compatible with cat buffer.txt | redis-cli --pipe style insertion.

One thing to note is that at present you can't simply replace any given phpredis call with a call to the RedisMI object, as commands are processed as variable argument calls (like hiredis), which work for most, but not all phpredis commands.

Here is a simple example of how you might use it:

<?php
$obj_mi = new RedisMI();

// Some context we can pass around in RedisMI for whatever we want
$obj_context = new StdClass();
$obj_context->session_id = "some-session-id";

// Attach this context to the RedisMI object
$obj_mi->SetInfo($obj_context);

// Set a callback when a buffer is saved
$obj_mi->SaveCallback(
    function($obj_mi, $str_filename, $i_cmd_count) {
        // Output our context info we attached
        $obj_context = $obj_mi->GetInfo();
        echo "session id: " . $obj_context->session_id . "\n";

        // Output the filename and how many commands were sent
        echo "buffer file: " . $str_filename . "\n";
        echo "commands   : " . $i_cmd_count . "\n";
    }
);

// A thousand SADD commands, adding three members each time
for($i=0;$i<1000;$i++) {
    $obj_mi->sadd('some-set', "$i-one", "$i-two", "$i-three");
}

// A thousand ZADD commands
for($i=0;$i<1000;$i++) {
    $obj_mi->zadd('some-zset', $i, "member-$i");
}

// Save the buffer
$obj_mi->SaveBuffer('test.buf');
?>

Then you can do something like this:

➜  tredismi  php mi.php
session id: some-session-id
buffer file: test.buf
commands   : 2000
➜  tredismi  cat test.buf|redis-cli --pipe
All data transferred. Waiting for the last reply...
Last reply received from server.
errors: 0, replies: 2000

Cheers!

Thamora answered 9/10, 2013 at 20:31 Comment(1)
Thanks for the answer! I managed to process 20 million rows from DB and store about 2,5M keys in redis using transactions. However, since redis is simply amazing and made a great difference in our stack, in the future I'll definitely try your solution.Kwakiutl

© 2022 - 2024 — McMap. All rights reserved.