I'm about upgrade a quite large PostgreSQL cluster from 9.3 to 11.
The upgrade
The cluster is approximately 1,2Tb in size. The database has a disk system consisting of a fast HW RAID 10 array of 8 DC-edition SSDs with 192GB ram and 64 cores. I am performing the upgrade by replicating the data to a new server with streaming replication first, then upgrading that one to 11.
I tested the upgrade using pg_upgrade
with the --link
option, this takes less than a minute. I also tested the upgrade regularly (without --link
) with many jobs, that takes several hours (+4).
Questions
Now the obvious choice is of cause for me to use the --link
option, however all this makes me wonder - is there any downsides (performance or functionality wise) to using that over the regular slower method? I do not know the internal workings of postgresql data structures, but I have a feeling there could be a performance difference after the upgrade between rewriting the data entirely and to just using hard links
- whatever that means?
Considerations
The only thing I can find in the documentation about the drawbacks of --link
is the downside of not being able to access the old data directory after the upgrade is performed https://www.postgresql.org/docs/11/pgupgrade.htm However that is only a safety concern and not a performance drawback and doesn't really apply in my case of replicating the data first.
The only other thing I can think of is reclaiming space, with whatever performance upsides that might have. However as I understand it, that can also be achieved by running a VACUUM FULL DATABASE
(or CLUSTER
?) command after the --link
-upgraded database has been upgraded? Also the reclaiming of space is not very impactful performance wise on an SSD as I understand.
I appreciate if anyone can help cast some light into this.