PostgreSQL Database Log Transaction
Asked Answered
V

3

1

I'm a last year college student and I'm doing my thesis right now. My title is "Index Suggestion based on Log Analysis". This project will analyze the PostgreSQL transaction log to give index recommendation to the database that will be tested.

This research will develop an index recommender tool by analyzing the attribute that is frequently accessed (using SELECT statement).

But, I found it's hard to find the PostgreSQL log file. My question is, where can I find PostgreSQL log transaction dataset? Or maybe other database log transaction dataset?

Vineyard answered 20/2, 2019 at 7:10 Comment(8)
But SELECT is not logged in the transaction log. I don't mean to put you down, but shouldn't you learn the basics before writing your thesis?Pozsony
You'd have to hook into the query optimizer somewhere, but it is not a simple task.Pozsony
Are you talking about the write-ahead log, or the server log? The WAL (a.k.a. the transaction log) doesn't record queries, but the server log will (if you tell it to). If you just want a record of SELECT statements, then pg_stat_statements is probably a better starting point, but parsing the queries accurately sounds like a lot more work than the actual analysis...Telefilm
I'd start by reading all the existing StackOverflow questions about Postgres query-optimasation and/or performance. There are some patterns to be detected.Magistrate
@NickBarnes I'm talking about write-ahead log. Yes, it's hard to parse the log file. I already try to parse the log using regular expression. What type of parsing that may be fit to parse the log file?Vineyard
@LaurenzAlbe I ever found SELECT statement in PostgreSQL log. But the SELECT statement doesn't appear anymore.Vineyard
@Magistrate thanks. do you know where can i get postgreSQL log file dataset that can be downloaded?Vineyard
Usethe optimisation cases from SO (or DBA@SO) as training material, and fetch the logfiles yourself.Magistrate
P
2

You are mixing up the transaction log (WAL) and the regular text log file.

The latter does contain statements (if the configuration is set like that), while the transaction log doesn't contain statements at all, just binary information about what has changed in which block.

You won't be able to recommend an index just from looking at the query, I can't do that either.

I have a suggestion for you: if you want to write a tool that suggests indexes, it should take the output of EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) SELECT /* your query */ as input.

Moreover, the tool will have to be connected to the database to query table and index metadata (and perhaps statistics). That makes you dependent on the database version, because metadata can change (and do – see partitioned tables), but that won't concern you so much in a thesis paper.

The task is still not simple (query optimization is AI), but then you have at least a chance.

Pozsony answered 21/2, 2019 at 4:28 Comment(3)
I'm a beginner in PostgreSQL. I've already set my log_statement: ON in postgresql.conf. but i still can't find the SELECT statement in my log. what configuration that I can do to mixing up the transaction log and the regular text log file?Vineyard
can you explain more about taking the output of EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) as input ?Vineyard
Sure you get the execution plan that way.Pozsony
Y
0

You need to run the query below then restart PostgreSQL to enable logging persistently. *The parameter with ALTER SYSTEM SET is set to postgresql.auto.conf rather than postgresql.conf:

ALTER SYSTEM SET log_statement = 'all';

And, you need to run either of the queries below then restart PostgreSQL to disable logging persistently:

ALTER SYSTEM RESET log_statement;

Or:

ALTER SYSTEM SET log_statement = 'none';

You can also run the query below then need to restart PostgreSQL to enable logging persistently:

ALTER SYSTEM SET log_min_duration_statement = 0;

And, you can also run either of the queries below then need to restart PostgreSQL to disable logging persistently:

ALTER SYSTEM RESET log_min_duration_statement;

Or:

ALTER SYSTEM SET log_min_duration_statement = -1;

You can see my answer explaining more about how to enable and disable query logs on PostgreSQL.

Yap answered 21/8, 2022 at 7:59 Comment(0)
B
0

A bit late to the party here, but the thing you would probably want in practice is pg_stat_statements. Use it to list the queries with the highest total_exec_time, and look at their query plans. Then you would consider adding indexes that would speed up joins or scans in those queries.

This should be possible to automate to some extent. Similarly, recommending indexes to drop should be possible to do using index usage statistics. Personally, I'd love to have a tool that does this kind of suggestions automatically, and it would be a great example of profile guided optimization.

Brawny answered 21/10, 2022 at 9:38 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.