We have a product using PostgreSQL database server that is deployed at a couple of hundred clients. Some of them have gathered tens of gigabytes of data over the years. So in the next version we will introduce automated cleanup procedures which will gradually archive and DELETE old records during nightly batch jobs.
If I understand correctly, autovacuum will kick in and analyze and reorganize the tuples, so the performance will be like it was when fewer records existed.
The actual disk space will not be released, if I understand correctly, since that only happens with a VACUUM FULL, and that is not triggered by autovacuum.
So I was thinking about an automated process that would do this.
I found the bloat view that is used by nagios check_postgres at http://wiki.postgresql.org/wiki/Show_database_bloat.
Is this view any good? Am I correct in understanding that if the tbloat is > 2, it could use a VACUUM FULL? And if the ibloat is too high, it could use a REINDEX?
Any comments on the following job to run as daily batch job?
vacuumdb -Z mydatabase
#vacuum with analyze-onlyselect tablename from bloatview order by tbloat desc limit 1
vacuumdb -f -t tablename mydatabase
select tablename, iname from bloatview order by ibloat desc limit 1
reindexdb -t tablename -i iname mydatabase
Off course, I still need to wrap it in a nice perl script in crontab (we're using ubuntu 12), or does postgresql have some sort of scheduler I could do this with?
Or is this total overkill and is there a much simpler procedure?