Oracle CLOB vs BLOB
Asked Answered
V

1

8

I want to know what does Oracle's CLOB has to offer over BLOB data type. Both have data storage limits of (4 GB - 1) * DB_BLOCK_SIZE.

A text string which is longer than 4000 bytes can not fit in VARCHAR2 column. Now, I can use CLOB and BLOB as well to store this string.

Everyone says, CLOB is good and meant for character data and BLOB is for binary data such as images, unstructured documents.

But I see I can store character data inside a BLOB as well.

What I want to know:

So, question is on the basics, why CLOB and why not BLOB always? Is there anything to do with encoding?

May be the question title should be, How CLOB handles the character data differently than a BLOB?

Void answered 15/4, 2019 at 8:15 Comment(11)
Run a select * from ... on a table with a blob column, then tell me quickly what the text inside the BLOB is. And if you can't read it, then tell me which encoding was used to store the bytes which is something you have to know if you want to convert the raw bytes back to a readable string.Rickierickman
Why would you not use a CLOB to store character data? It's like storing dates in DATE columns and numeric values in NUMBER columns: use the most appropriate datatype for the data you want to store.Inattentive
@Inattentive Yeah. I agree to use CLOB for character data. But I want to know what exactly a CLOB offers over BLOB ?Void
What exactly would the of a BLOB be? i.imgur.com/u1aCQ97.png or i.imgur.com/HuWDg1x.png - which text does the column blob_data contain?Rickierickman
Where did you read the limit is 4GB? The limit is (4GB - 1) * DB_BLOCK_SIZE, i.e. 8TB to 128TB depending on DB_BLOCK_SIZE.Turgite
@WernfriedDomscheit Thanks for pointing. Rectified the question.Void
All string related functions like REGEXP_xxx, SUBSTR, xPAD, xTRIM, etc. are available only for CLOB but not for BLOB. Why on earth do you consider to store character data in BLOB?Turgite
I am not. But I want to know how BLOB treats the character type data.Void
@a_horse_with_no_name Ok. I get your point that select * will give numbers in output for a blob column. But as per https://mcmap.net/q/158551/-how-do-i-get-textual-contents-from-blob-in-oracle-sql , I can use substr function to get textual contents. Now, how is encoding handled here ? A clob and a blob eventually gets stored in computer in a binary form. Does clob stores the conding information along with it and uses it while retrieving the data ?Void
@ShaileshPratapwar - no, that answer (which is incomplete anyway, not sure how it's got so many upvotes) will at best give you a selection of bytes from the BLOB, not characters. That selection might represent characters, if you know what character set they represent so you can convert them; but if they were from a multibyte character set you might split a single character in half and not be able to convert it anyway. (Or convert it wrongly, and not notice.)Credent
FWIW, Oracle recommends storing JSON (which is text) in BLOB datatype: search for "Considerations When Using LOB Storage for JSON Data" and lists some reasons; mostly around character set conversion and size.Pie
C
4

I want to know how BLOB treats the character type data.

It doesn't treat it as character type data, it only see it as a stream of bytes - it doesn't know or care what it represents.

From the documentation:

The BLOB data type stores unstructured binary large objects. BLOB objects can be thought of as bitstreams with no character set semantics.


Does clob stores the conding information along with it and uses it while retrieving the data ?

Not explicitly, but the data is stored in the database character set, as with VARCHAR2 data. From the documentation again:

The CLOB data type stores single-byte and multibyte character data. Both fixed-width and variable-width character sets are supported, and both use the database character set.

You might also have noticed that the dbms_lob package has procedures to convert between CLOB and BLOB data types. For both of those you have to specify the character set to use. So if you choose to store character data as a BLOB you have to know the character set when converting it to a BLOB, but perhaps more crucially you have to know the character set to be able convert it back. You can do it, but it doesn't mean you should. You have no way to validate the BLOB data until you come to try to convert it to a string.

As @APC alluded to, this is similar to storing a date as a string - you lose the advantages and type-safety using the correct data type would give you, and instead add extra pain, uncertainty and overhead for no benefit.

The question isn't really what advantages CLOBs have over BLOBs for storing character data; the question is really the reverse: what advantages do BLOBs have over CLOBs for storing character data? And the answer is usually that there are none.

@Boneist mentions the recommendation to store JSON as BLOBs, and there is more about that here.

(The only other reasons I can think of off-hand are that you have to store data from multiple source character sets and want to preserve them exactly as you received them. But then either you are only storing them and will never examine or manipulate the data from within the database itself, and will only return them to some external application untouched; in which case you don't care about the character set - so you're handling purely binary data and shouldn't be thinking of it as character data at all, any more than you'd care that an image you're storing is PNG vs. JPG or whatever. Or you will need to work with the data and so will have to record which character set each BLOB object represents, so you can convert as needed.)

Credent answered 15/4, 2019 at 10:8 Comment(3)
Thanks. This helps to some extent and points to the role of database character sets in CLOB/varchar vs BLOB/binary storage. This also highlights some special cases when to have BLOB instead of CLOB.Void
In Overview of Storing and Managing JSON Data it says: "In AL32UTF8 databases CLOB instances are stored using the UCS2 character set" - is that really true? UCS-2 is able to represent only the BMP characters, thus not "Unicode-capable". UCS-2 is obsolete, I wound that Oracle still refers to it.Turgite
Yeah,not sure; this doesn't help much either. I decided not to dig into the JSON stuff any further, but just wanted to mention it as Boneist had pointed it out - I was tempted to stop at 'usually there are none', and I wasn't sure about 'usually', but figured someone would point out legitimate if obscure uses. (Happy to have more added if/when people think of them!)Credent

© 2022 - 2024 — McMap. All rights reserved.