What's the fastest way to poll a MySQL table for new rows?
Asked Answered
A

7

7

My application needs to poll a MySQL database for new rows. Every time new rows are added, they should be retrieved. I was thinking of creating a trigger to place references to new rows on a separate table. The original table has over 300,000 rows.

The application is built in PHP.

Some good answers, i think the question deserves a bounty.

Assured answered 8/9, 2010 at 6:44 Comment(3)
IMO, if possible, whatever layer you use to insert, i.e. services wrapping CRUD operations, should 'notify' your application after an insert. This way you are not constantly polling.Beitnes
@Alex: They're two different independent applications. The second application only reads from the database.Assured
I'd say the AFTER INSERT trigger would be spot on, implement at MySQL level, and let scripts poll & clean up the new entries in the other table. That way, even forcing another (non-autoincrement) id would still work.Geniality
W
9

For external applications I find using a TimeStamp column is a more robust method that is independent of auto id and other primary key issues

Add columns to the tables such as:

insertedOn TIMESTAMP DEFAULT CURRENT_TIMESTAMP

or to track inserts and updates

updatedOn TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP

In the external application all you need to do is track the last timestamp when you did a poll. Then select from that timestamp forward on all the relevant tables. In large tables you may need to index the timestamp column

Winne answered 16/9, 2010 at 20:32 Comment(4)
Indexing is usually beneficial. There are plenty of use cases when the index overhead is not worth it. Typically a table that has many inserts and deletes between each TIMESTAMP based select, and the TIMESTAMP based select is performed infrequentlyWinne
Something to be careful of with this solution: if the application doing the polling is getting the changes in batches (e.g. SELECT * FROM TABLE WHERE updatedOn > :LAST_TIMESTAMP ORDER BY updateOn LIMIT 100) and there is the possibility more than the batch size can be updated at once (e.g. UPDATE TABLE SET COLUMN='VALUE' WHERE OTHER_COLUMN='SOMETHING THAT WILL SELECT HUNDREDS OF ROWS') then you will miss rows.Assumed
This may easily be unreliable, depending on the volume of inserts - multiple inserts can receive the same timestamp. This is of course solvable by some cleverness. I opted for an auto incrementing primary key, but the trigger based version described above you could update a memory based table which could be polled quite rapidly.Mandragora
@AdamByrtek It will not always be beneficial, you may not need to query or sort on the particular timestamp, indexes are not freeCrept
H
3

You can use the following statement to find out if a new record was inserted in the table:

select max(id) from table_name

replacing the name of primary key and table name in the above statement. Keep the max(id) value in a temporary variable, and retrieve all new records between this and the last saved max(id) value. After fetching the new records, set max(id) value to the one you got from the query.

Humidor answered 9/9, 2010 at 9:10 Comment(1)
Why not select * from table_name where id > :maxReforest
S
1

Create a PHP Daemon to monitor the MySQL Table File size, if size changes query for new records, if new records found run next process.

I think there is an active PEAR daemon you can easily configure to monitor the MySQL Table file size and kick off your script.

Sibling answered 20/9, 2010 at 15:18 Comment(2)
I'm not sure for MySQL, but usually table space is allocated in chunks, so that once an allocation has been done, several rows could be added before the need for another allocation arises.Adamis
Many tables are in the same file if using innodb.Mandragora
E
0

assuming you have an identify or some other data that always grow, you should keep track on your php application of the last id retrieved.

that'd work for most scenarios. Unless you are into the real time camp, I don't think you'd need any more than that.

Extractor answered 9/9, 2010 at 9:54 Comment(0)
C
0

I would do something like this. Of course, this is assuming that ID is an incrementing numerical ID. And how you store your "current location" in the database is upto you.

<?
$idFile = 'lastID.dat';

if(is_file($idFile)){
    $lastSelectedId = (int)file_get_contents($idFile);
} else {
    $lastSelectedId = 0;
}

$res = mysql_query("select * from table_name where id > {$lastSelectedId}");

while($row = mysql_fetch_assoc($res)){
    // Do something with the new rows

    if($row['id']>$lastSelectedId){
        $lastSelectedId = $row['id'];
    }
}

file_put_contents($idFile,$lastSelectedId);

?>
Cecum answered 10/9, 2010 at 0:5 Comment(0)
L
0

I would concurr with TFD's answer about keeping track of a timestamp in an separate file/table and then fetching all rows newer than that. That's how I do it for a similar application.

Your application querying a single row table (or file) to see if a timestamp has changed from the local storage should not be much of a performance hit. Then, fetching new rows from the 300k row table based on timestamp should again be fine, assuming timestamp is properly indexed.

However, reading your question I was curious if Mysql triggers can do system calls, say a php script that would do some heavy lifting. Turns out they can by using the sys_exec() User-Defined Function. You could use this to do all sorts of processing by passing into it the inserted row data, essentially having an instant notification of inserts.

Finally, a word of caution about using triggers to call external applications.

Luminescence answered 17/9, 2010 at 8:7 Comment(0)
I
0

One option might be to use an INSERT INTO SELECT statement. Taking from the suggestions using timestamps to pull the latest rows, you could do something like...

INSERT INTO t2 (
    SELECT * 
    FROM t1 
    WHERE createdts > DATE_SUB(NOW(), INTERVAL 1 HOUR)
);

This would take all of the rows inserted in the previous hour and insert them in to table 2. You could have a script run this query and have it run every hour (or whatever interval you need).

This would drastically simplify your PHP script for pulling rows as you wouldn't need to iterate over any rows. It also gets rid of having to keep track of the last insert id.

The solution Fanis purposed also sounds like it could be interesting as well.

As a note, the select query in the above insert can but adjusted to only insert certain fields. If you only need certain fields, you would need to specify them in the insert like so...

INSERT INTO t2 (field1, field2) (
    SELECT field1, field2 
    FROM t1 
    WHERE createdts > DATE_SUB(NOW(), INTERVAL 1 HOUR)
);
Infeudation answered 20/9, 2010 at 2:29 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.