Performance Tuning: Create index for boolean column
Asked Answered
R

4

32

I have written a daemon processor which will fetch rows from one database and insert them into another for synchronizing. It will fetch rows based on a boolean indication flag sync_done.

My table has hundreds of thousands of rows. When I select all rows with sync_done is false, will it cause any database performance issues? Should I apply indexing for that sync_done column to improve performance, since only rows with a sync_done value of false are fetched?

Say, I have 10000 rows. Of those, 9500 have already been synchronized (sync_done is true) and will not be selected.

Please suggest how I might proceed.

Regime answered 19/8, 2012 at 7:58 Comment(0)
D
53

For a query like this, a partial index covering only unsynced rows would serve best.

CREATE INDEX ON tbl (id) WHERE sync_done = FALSE;

However, for a use case like this, other synchronization methods may be preferable to begin with:

Drusilla answered 19/8, 2012 at 12:47 Comment(0)
K
21

I suggest that you do not index the table (the boolean is a low cardinality field), but partition it instead on the boolean value.

See: http://www.postgresql.org/docs/9.1/static/ddl-partitioning.html

Kilimanjaro answered 19/8, 2012 at 14:39 Comment(0)
D
2

A table with records and a boolean field should be the way to do it.

Here is something which I believe might help you...

Bitmap Index

Alternative of Bitmap Index in PostgreSQL

Danas answered 19/8, 2012 at 8:5 Comment(3)
Postgresql now supports bitmap indexes.Misty
@mlissner: That's probably a misunderstanding. There are nor "bitmap indexes" in Postgres. Postgres supports the index access method "bitmap index scan" for many different index types.Drusilla
But it does support bloom filters, which can be used for similar cases.Dove
Y
1

An index will certainly help but rather than polling which can impose load and concurrency issues if your database is heavily used it might be worth considering a notification method such as amqp or trigger/database queue based approach instead like Slony or Skytools Londiste. I have used both Slony and Londiste for trigger based replication and have found both excellent. My preference is for Londiste as it is much simpler to set up and manage (and if you have a simple use case stick to the older 2. branch).

Yseulte answered 19/8, 2012 at 9:7 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.