PostgreSQL: BYTEA vs OID+Large Object?
Asked Answered
T

4

13

I started an application with Hibernate 3.2 and PostgreSQL 8.4. I have some byte[] fields that were mapped as @Basic (= PG bytea) and others that got mapped as @Lob (=PG Large Object). Why the inconsistency? Because I was a Hibernate noob.

Now, those fields are max 4 Kb (but average is 2-3 kb). The PostgreSQL documentation mentioned that the LOs are good when the fields are big, but I didn't see what 'big' meant.

I have upgraded to PostgreSQL 9.0 with Hibernate 3.6 and I was stuck to change the annotation to @Type(type="org.hibernate.type.PrimitiveByteArrayBlobType"). This bug has brought forward a potential compatibility issue, and I eventually found out that Large Objects are a pain to deal with, compared to a normal field.

So I am thinking of changing all of it to bytea. But I am concerned that bytea fields are encoded in Hex, so there is some overhead in encoding and decoding, and this would hurt the performance.

Are there good benchmarks about the performance of both of these? Anybody has made the switch and saw a difference?

Tsarina answered 10/1, 2011 at 8:50 Comment(2)
A caveat for large objects - as all chunks share the same system table, the limit on how much you can store in total is lower than for bytea.Lind
Depending on what they are, lo_compat_privileges may solve the compatibility issues.Lind
B
8

Basically there are cases where each makes sense. bytea is simpler and generally preferred. The client libs give you the decoding so that's not an issue.

However LOBs have some neat features, such as an ability to seek within them and treat the LOB as a byte stream instead of a byte array.

"Big" means "Big enough you don't want to send it to the client all at once." Technically bytea is limited to 1GB compressed and a lob is limited to 2GB compressed, but really you hit the other limit first anyway. If it's big enough you don't want it directly in your result set and you don';t want to send it to the client all at once, use a LOB.

Braque answered 5/9, 2012 at 9:17 Comment(1)
I assume large objects are compressed, same as TOASTed bytea columns?Lind
F
5

But I am concerned that bytea fields are encoded in Hex

bytea input can be in hex or escape format, that's your choice. Storage will be the same. As of version 9.0, the output default is hex, but you can change this by editting the parameter bytea_output.

I haven't seen any benchmarks.

Fingerstall answered 10/1, 2011 at 9:7 Comment(1)
Also it isn't stored as hex, and I think libpq (and maybe even the protocol) has an interface for binary transfers of both.Braque
D
4

tl;dr Use bytea

...unless you need streaming or >1GB values


Bytea: A byte sequence that works like any other TOAST-able value. Limited to 1GB per value, 32TB per table.

Large object: Binary data split up into multiple rows. Supports seek, read, and write like an OS file, so operations don't require loading it all into memory at once. Limited to 4TB per value, 32TB per database.


Large objects have the following downsides:

  1. The is only large object table per database.

  2. Large objects aren't automatically removed when the "owning" record is deleted. See the lo_manage function in the lo module.

  3. Since there is only one table, large object permissions have to be handled record by record.

  4. Streaming is difficult, and has less support by client drivers than simple bytea.

  5. It's part of the system schema, so you have limited to no control over options like partitioning and tablespaces.


I venture that 93% of real-world uses of large objects would be better served by bytea.

Dessert answered 6/1, 2021 at 0:58 Comment(2)
I read in the docs that large object allows values up to 4 TB in size. It also admits that "TOAST" way of storage made the large objects partially obsolete.Chalybeate
Thank you, you are correct @Ola. The 2GB limit is an older one. I have updated the answer.Dessert
F
1

I don't have a comparison of large objects and bytea handy, but note that the switch to the hex output format in 9.0 was made also because it is faster than the previous custom encoding. As far as text encoding of binary data goes, you probably won't get much faster than what there currently is.

If that is not good enough for you, you can consider using the binary protocol between PostgreSQL client and server. Then you basically get the stuff straight from disk, much like large objects. I don't know if the PostgreSQL JDBC supports that yet, but a quick search suggests no.

Ferriter answered 10/1, 2011 at 13:56 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.