Are there any negative performance or functionality downsides to using pg_upgrade with --link option afterwards?
Asked Answered
B

1

6

I'm about upgrade a quite large PostgreSQL cluster from 9.3 to 11.

The upgrade

The cluster is approximately 1,2Tb in size. The database has a disk system consisting of a fast HW RAID 10 array of 8 DC-edition SSDs with 192GB ram and 64 cores. I am performing the upgrade by replicating the data to a new server with streaming replication first, then upgrading that one to 11.

I tested the upgrade using pg_upgrade with the --link option, this takes less than a minute. I also tested the upgrade regularly (without --link) with many jobs, that takes several hours (+4).

Questions

Now the obvious choice is of cause for me to use the --link option, however all this makes me wonder - is there any downsides (performance or functionality wise) to using that over the regular slower method? I do not know the internal workings of postgresql data structures, but I have a feeling there could be a performance difference after the upgrade between rewriting the data entirely and to just using hard links - whatever that means?

Considerations

The only thing I can find in the documentation about the drawbacks of --link is the downside of not being able to access the old data directory after the upgrade is performed https://www.postgresql.org/docs/11/pgupgrade.htm However that is only a safety concern and not a performance drawback and doesn't really apply in my case of replicating the data first. The only other thing I can think of is reclaiming space, with whatever performance upsides that might have. However as I understand it, that can also be achieved by running a VACUUM FULL DATABASE (or CLUSTER?) command after the --link-upgraded database has been upgraded? Also the reclaiming of space is not very impactful performance wise on an SSD as I understand.

I appreciate if anyone can help cast some light into this.

Burial answered 4/12, 2018 at 18:0 Comment(0)
P
9

There is absolutely no downside to using hard links (with the exception you noted, that the old cluster is dead and has to be removed).

A hard link is in no way different from a normal file.

A “file” in UNIX is in reality an “inode”, a structure containing file metadata. An entry in a directory is a (hard) link to that inode.

If you create another hard link to the inode, the same file will be in two different directories, but that has no impact whatsoever on the behavior of the file.

Of course you must make sure that you don't start both the only and the new server. Instant data corruption would ensue. That's why you should remove the old cluster as soon as possible.

Potable answered 4/12, 2018 at 18:25 Comment(3)
What a great answer!Burial
Does this mean we can actually drop the old cluster after upgrading? Won't it be a problem, since the original files are in the old cluster data directory? And for example my old data directory is 220GB, the new one is 200MB, if I drop the old cluster, will the new one become 220GB?Salvidor
You must remove the old cluster. You misunderstand hard links: After the link, the file still exists only once. "Removing" (in reality unlinking) the file will just remove the directory entry. Files are not physically in a directory, so nothing will change in the new cluster if you remove the old one.Potable

© 2022 - 2024 — McMap. All rights reserved.