Fillfactor for a sequential index that is PK
Asked Answered
B

1

3

Yes, fillfactor again. I spend many hours reading and I can't decide what's best for each case. I don't understand when and how fragmentation happens. I'm migrating a database from MS SQL Server to PostgreSQL 9.2.

Case 1

10-50 inserts / minute in a sequential (serial) PK, 20-50 reads / hour.

CREATE TABLE dev_transactions (
  transaction_id serial NOT NULL,
  transaction_type smallint NOT NULL,
  moment timestamp without time zone NOT NULL,
  gateway integer NOT NULL,
  device integer NOT NULL,
  controler smallint NOT NULL,
  token integer,
  et_mode character(1),
  status smallint NOT NULL,
  CONSTRAINT pk_dev_transactions PRIMARY KEY (transaction_id)
);

Case 2

Similar structure, index for serial PK, writes in blocks (one shot) of ~ 50.000 registers every 2 months, readings 10-50 / minute.

Does a 50% fillfactor mean that each insert generates a new page and moves 50% of existing rows to a newly generated page?

Does a 50% fillfactor mean frees space is allocated between physical rows in new data pages?

A new page is generated only if there is no free space left in existing pages?

As you can see I'm very confused; I would appreciate some help — maybe a good link to read about PostgreSQL and index fillfactor.

Bullet answered 6/1, 2013 at 21:53 Comment(2)
Are you doing updates on the table? The fill factor is related to update performance.Ashlynashman
Never in the field I want the index, in both cases is PK and when is inserted it remain with the same valor forever.Bullet
R
14

FILLFACTOR

With only INSERT and SELECT you should use a FILLFACTOR of 100 for tables (which is the default anyway). There is no point in leaving wiggle room per data page if you are not going to "wiggle" with UPDATEs.

The mechanism behind FILLFACTOR is simple. INSERTs only fill data pages (usually 8 kB blocks) up to the percentage declared by the FILLFACTOR setting. Also, whenever you run VACUUM FULL or CLUSTER on the table, the same wiggle room per block is re-established. Ideally, this allows UPDATE to store new row versions in the same data page, which can provide a substantial performance boost when dealing with lots of UPDATEs. Also beneficial in combination with H.O.T. updates. See:

Indexes need more wiggle room by design. They have to store new entries at the right position in leaf pages. Once a page is full, a relatively costly "page split" is needed. So indexes tend to bloat more than tables. The default FILLFACTOR for a (default) B-Tree index is 90 (varies per index type). And wiggle room makes sense for just INSERTs, too. The best strategy heavily depends on write patterns.

Example: If new inserts have steadily growing values (typical case for a serial or timestamp column), then there are basically no page-splits, and you might go with FILLFACTOR = 100 (or a bit lower to allows for some noise).
For a random distribution of new values, you might go below the default 90 ...

Basic source of information: the manual for CREATE TABLE and CREATE INDEX.

Other optimization

But you can do something else - since you seem to be a sucker for optimization ... :)

CREATE TABLE dev_transactions(
  transaction_id   serial PRIMARY KEY
, gateway          integer NOT NULL
, moment           timestamp NOT NULL
, device           integer NOT NULL
, transaction_type smallint NOT NULL
, status           smallint NOT NULL
, controller       smallint NOT NULL
, token            integer
, et_mode          character(1)
);

This optimizes your table with regard to data alignment and avoids padding for a typical 64 bit server and saves a few bytes, probably just 8 byte on average - you typically can't squeeze out much with "column tetris":

Keep NOT NULL columns at the start of the table for a very small performance bonus.

Your table has 9 columns. The initial ("cost-free") 1-byte NULL bitmap covers 8 columns. The 9th column triggers an additional 8 bytes for the extended NULL bitmap - if there are any NULL values in the row.

If you make et_mode and token NOT NULL, all columns are NOT NULL and there is no NULL bitmap, freeing up 8 bytes per row.
This even works per row if some columns can be NULL. If all fields of the same row have values, there is no NULL bitmap for the row. In your special case, this leads to the paradox that filling in values for et_mode and token can make your storage size smaller or at least stay the same:

Basic source of information: the manual on Database Physical Storage.

Compare the size of rows (filled with values) with your original table to get definitive proof:

SELECT pg_column_size(t) FROM dev_transactions t;

(Plus maybe padding between rows, as the next row starts at a multiple of 8 bytes.)

Roustabout answered 6/1, 2013 at 22:14 Comment(5)
@Erwin thank you very much, this information is very useful and yes I'm a fk sucker for optimisation. I will accept your answer. If you have some link for I understand in deep how the fill factor work, I'll appreciate.Bullet
@HMarioD: I added some more explanation and links to my answer.Roustabout
Thanks, one more question, I will add a index for transaction_id with fill factor of 100. The UNIQUE keyword is not necessary because the field is PK, right?Bullet
@HMarioD: The whole index is not necessary because the field is the PK - which is implemented by way of a fully functional unique index in Postgres. You are already done here. ;) And yes, indexes inherit the fillfactor setting of the table (unless you specify otherwise).Roustabout
In ms SQL Server is different, a pk can't be considered as an index as long I know. Thanks a lot one more time I will give a trip to your others answer.Bullet

© 2022 - 2024 — McMap. All rights reserved.