PostgreSQL tuple format
Asked Answered
P

1

17

Is there any document describing the tuple format that PostgreSQL server adheres to? The official documentation appears arcane about this.

A single tuple seems simple enough to figure out, but when it comes to arrays of tuples, arrays of composite tuples, and finally nested arrays of composite tuples, it is impossible to be certain about the format simply by looking at the output.

I am asking this following my initial attempt at implementing pg-tuple, a parser that's still missing today, to be able to parse PostgreSQL tuples within Node.js


Examples

create type type_A as (
   a int,
   b text
);
  • with a simple text: (1,hello)
  • with a complex text: (1,"hello world!")

create type type_B as (
   c type_A,
   d type_A[]
);
  • simple-value array: {"(2,two)","(3,three)"}

  • for type_B[] we can get:

{"(\"(7,inner)\",\"{\"\"(88,eight-1)\"\",\"\"(99,nine-2)\"\"}\")","(\"(77,inner)\",\"{\"\"(888,eight-3)\"\",\"\"(999,nine-4)\"\"}\")"}

It gets even more complex for multi-dimensional arrays of composite types.


UPDATE

Since it feels like there is no specification at all, I have started working on reversing it. Not sure if it can be done fully though, because from some initial examples it is often unclear what formatting rules are applied.

Proceeds answered 15/9, 2016 at 21:17 Comment(5)
Asking out of curiosity - are you doing it because of the need or just for science? There are things like json and hstore in Postgres.Ferrante
@KamilG. this comes out of practical needs from the community of developers. I am also the author of pg-promise. So no, it is not academical.Proceeds
I don't quite unerstand. Do you want to know how tuples are stored on disk? Or how a tuple looks when output as string? Something else?Lorilee
I'm asking about the tuple strings format that PostgreSQL outputs.Proceeds
Here are the relevant docs for composites and arrays, which seem to explain the quoting rules pretty thoroughly. There's nothing fancy going on when you combine these; nested elements are converted to strings, and then quoted / escaped like any other string (so for example, select row('{1,2}'::int[]) is indistinguishable from select row('{1,2}'::text)).Funchal
F
2

As Nick posted, according to docs:

the whitespace will be ignored if the field type is integer, but not if it is text.

and

The composite output routine will put double quotes around field values if they are empty strings or contain parentheses, commas, double quotes, backslashes, or white space.

and

Double quotes and backslashes embedded in field values will be doubled.

and now quoting Nick himself:

nested elements are converted to strings, and then quoted / escaped like any other string

I give shorted example below, comfortably compared against its nested value:

a=# create table playground (t text, ta text[],f float,fa float[]);
CREATE TABLE
a=# insert into playground select 'space here',array['','bs\'],8.0,array[null,8.1];
INSERT 0 1
a=# insert into playground select 'no_space',array[null,'nospace'],9.0,array[9.1,8.0];
INSERT 0 1
a=# select playground,* from playground;
                    playground                     |     t      |       ta       | f |     fa
---------------------------------------------------+------------+----------------+---+------------
 ("space here","{"""",""bs\\\\""}",8,"{NULL,8.1}") | space here | {"","bs\\"}    | 8 | {NULL,8.1}
 (no_space,"{NULL,nospace}",9,"{9.1,8}")           | no_space   | {NULL,nospace} | 9 | {9.1,8}
(2 rows)

If you go for deeper nested quoting, look at:

a=# select nested,* from (select playground,* from playground) nested;
                                                         nested                                                          |                    playground                     |     t      |       ta       | f |     fa
-------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------+------------+----------------+---+------------
 ("(""space here"",""{"""""""",""""bs\\\\\\\\""""}"",8,""{NULL,8.1}"")","space here","{"""",""bs\\\\""}",8,"{NULL,8.1}") | ("space here","{"""",""bs\\\\""}",8,"{NULL,8.1}") | space here | {"","bs\\"}    | 8 | {NULL,8.1}
 ("(no_space,""{NULL,nospace}"",9,""{9.1,8}"")",no_space,"{NULL,nospace}",9,"{9.1,8}")                                   | (no_space,"{NULL,nospace}",9,"{9.1,8}")           | no_space   | {NULL,nospace} | 9 | {9.1,8}
(2 rows)

As you can see, the output again follows rules the above.

This way in short answers to your questions would be:

  • why array is normally presented inside double-quotes, while an empty array is suddenly an open value? (text representation of empty array does not contain comma or space or etc)
  • why a single " is suddenly presented as \""? (text representation of 'one\ two', according to rules above is "one\\ two", and text representation of the last is ""one\\\\two"" and it is just what you get)
  • why unicode-formatted text is changing the escaping for \? How can we tell the difference then? (According to docs,

PostgreSQL also accepts "escape" string constants, which are an extension to the SQL standard. An escape string constant is specified by writing the letter E (upper or lower case) just before the opening single quote

), so it is not unicode text, but the the way you tell postgres that it should interpret escapes in text not as symbols, but as escapes. Eg E'\'' will be interpreted as ' and '\'' will make it wait for closing ' to be interpreted. In you example E'\\ text' the text represent of it will be "\\ text" - we add backslsh for backslash and take value in double quotes - all as described in online docs.

  • the way that { and } are escaped is not always clear (I could not anwer this question, because it was not clear itself)
Flout answered 26/11, 2016 at 22:10 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.