PostgreSQL protocol data representation format specification?
Asked Answered
A

3

9

I am reading PostgreSQL protocol document. The document specifies message flow and containment format, but doesn't mention about how actual data fields are encoded in text/binary.

For the text format, there's no mention at all. What does this mean? Should I use just SQL value expressions? Or there's some extra documentation for this? If it's just SQL value expression, does this mean the server will parse them again?

And, which part of source code should I investigate to see how binary data is encoded?

Update

I read the manual again, and I found a mention about text format. So actually there is mention about text representation, and it was my fault that missing this paragraph.

The text representation of values is whatever strings are produced and accepted by the input/output conversion functions for the particular data type.

Anguished answered 26/10, 2013 at 18:34 Comment(0)
S
5

There are two possible data formats - text or binary. Default is a text format - that means, so there is only server <-> client encoding transformation (or nothing when client and server use same encoding). Text format is very simple - trivial - all result data is transformed to human readable text and it is send to client. Binary data like bytea are transformed to human readable text too - hex or Base64 encoding are used. Output is simple. There is nothing to describing in doc

 postgres=# select current_date;
     date    
 ────────────
  2013-10-27
 (1 row)

In this case - server send string "2013-10-27" to client. First four bytes is length, others bytes are data.

Little bit difficult is input, because you can separate a data from queries - depends on what API you use. So if you use most simple API - then Postgres expect SQL statement with data together. Some complex API expected SQL statement and data separately.

On second hand a using of binary format is significantly difficult due wide different specific formats for any data type. Any PostgreSQL data type has a two functions - send and recv. These functions are used for sending data to output message stream and reading data from input message stream. Similar functions are for casting to/from plain text (out/in functions). Some clients drivers are able to cast from PostgreSQL binary format to host binary formats.

Some information:

  • libpq API http://www.postgresql.org/docs/9.3/static/libpq.html
  • you can look to PostgreSQL src to send/recv and out/in function - look on bytea or date implementation src/backend/utils/adt/date.c. Implementation of libpq is interesting too src/interfaces/libpq
  • -
Sepulture answered 27/10, 2013 at 5:18 Comment(4)
I really have no idea what to do if a project committer says it's trivial for asking for the protocol spec… This breaks me down completely…Anguished
Maybe I don't understand your question. What is unclear on plain text format? en.wikipedia.org/wiki/Plain_textSepulture
Ah I re-read the manual and now discovered explicit mentioning about text protocol. I updated my question to point that. Manual's clear. Thanks and I'm sorry for my fault.Anguished
@PavelStehule: because there's no such thing as plain text. It must follow some grammar.Brechtel
F
3

The things closest to a spec of a PostgreSQL binary format I could find were the documentation and the source code of the "libpqtypes" library. I know, a terrible state of the documentation for such a huge product.

Flaunch answered 6/11, 2014 at 8:57 Comment(1)
both links are brokenHomelike
I
1

The text representation of values is whatever strings are produced and accepted by the input/output conversion functions for the particular data type. In the transmitted representation, there is no trailing null character; the frontend must add one to received values if it wants to process them as C strings. (The text format does not allow embedded nulls, by the way.)

Binary representations for integers use network byte order (most significant byte first). For other data types consult the documentation or source code to learn about the binary representation. Keep in mind that binary representations for complex data types might change across server versions; the text format is usually the more portable choice.

(quoted from the documentation, link)

So the binary protocol is not stable across versions, so you probably should treat it as an implementation detail and not use the binary representation. The text representation is AFAICT just the format of literals in SQL queries.

Interdigitate answered 13/10, 2016 at 13:23 Comment(1)
So basically it discourage the use of binary format. This is exactly why things like ASN.1 exist. PER can be a bit convoluted, but OER would have been nice: oss.com/asn1/knowledge-center/asn1-tools-general/…Shoeblack

© 2022 - 2024 — McMap. All rights reserved.