same query, two different ways, vastly different performance
Asked Answered
R

5

1

I have a Postgres table with more than 8 million rows. Given the following two ways of doing the same query via DBD::Pg, I get wildly different results.

$q .= '%';

## query 1
my $sql = qq{
    SELECT a, b, c
    FROM t 
    WHERE Lower( a ) LIKE '$q'
};
my $sth1 = $dbh->prepare($sql);
$sth1->execute();

## query 2
my $sth2 = $dbh->prepare(qq{
    SELECT a, b, c
    FROM t  
    WHERE Lower( a ) LIKE ?
});
$sth2->execute($q);

query 2 is at least an order of magnitude slower than query 1... seems like it is not using the indexes, while query 1 is using the index.

Would love hear why.

Recur answered 13/2, 2012 at 23:16 Comment(11)
How are you measuring the speed?Undoubted
initially it was simply from observing... the difference is so much that it is apparent. Then I put numbers to the observation with the help of use Benchmark;Recur
You should try switching the order of the tests, to rule out any caching effects.Undoubted
What's the value of $q before the first line?Success
yup, tried that already to eliminating caching. Trust me... there is a difference. Whether I do a standalone test or record the results in the actual application... there is an order of mag difference. No point in debating that. The question really is "why?"Recur
@punkish: Your comment about how the value is bound may be instructive. Can you try the binding case but bind using an explicit TEXT type?, e.g. $sth2->bind_param(1, $q, { pg_type => PG_TEXT }); $sth2->execute(); (You will need to use DBD::Pg qw(:pg_types); at the top to get PG_TEXT.)Howund
@punkish: What index(es) do you have on table t? From the looks of it, you should have an index over Lower(a) (in contrast to just indexing over a).Howund
I've asked another question (#9277493) about when queries are plannedHowund
@punkish: Did you try binding $q with an explicit type? Only I would be interested to know if it makes a differenceHowund
What is the performance like if you don't append '%' to $q and use a where clause like WHERE Lower( a ) LIKE ? || '%'?Hillside
Cross posted at perlmonksTelugu
M
5

With LIKE expressions, b-tree indexes can only be used if the search pattern is left-anchored, i.e. terminated with %. More details in the manual.
Thanks to @evil otto for the link. This link to the current version.

Your first query provides this essential information at prepare time, so the query planner can use a matching index.

Your second query does not provide any information about the pattern at prepare time, so the query planner cannot use any indexes.

Mccray answered 14/2, 2012 at 8:57 Comment(2)
I agree with your comment about index usage with '%' at the end of the pattern. But I don't think the query is planned at statement prepare time, so I don't agree with the latter part of your answer.Howund
left-anchored and constant. See postgresql.org/docs/8.4/interactive/indexes-types.htmlCoyotillo
G
1

I suspect that in the first case the query compiler/optimizer detects that the clause is a constant, and can build an optimal query plan. In the second it has to compile a more generic query because the bound variable can be anything at run-time.

Gangrene answered 14/2, 2012 at 0:8 Comment(4)
why do think I am querying for LIKE '%'? I am concatenating '%' to a supplied $q and searching for that. For example, in both cases, I am searching for LIKE 'abcde%', just doing the queries differently.Recur
I missed the concatenation, sorry. I have edited my answer. However, I still think the basic premise might be valid. In one case the query compiler gets a constant, which it can use in building a query plan. In the other, the compiler must build a query plan for the general case where the bound variable could be anything. You should examine the query plans to see if this is the case. I have never worked with PG (just Oracle & MySQL) so I don't know how to examine the query plans.Gangrene
yea, one interesting answer on the DBI mailing list is that perhaps the Pg query planner doesn't have as much information in a query with a bind value than in a query with the parameter inline. I am hoping to hear from others as well, because, if that is true, then it is a big strike against binding values in DBI, at least in this case.Recur
This is just db planning. The problem with bind values is that Pg doesn't know anything about the statistics, so it has to create a plan that is best in the worst case. Pg will only use indexes if the index will be faster and as a rule of thumb if the clause will select more than something like 5% of the table then a scan will generally be faster. In the bindvar case, the planner assumes you might pass in "%" for which a full-table scan will be faster than an index scan. In the constant expression case, it can estimate that the clause only hits a small enough subset.Coyotillo
S
0

Are you running both test cases from same file using same $dbh object? I think reason of increasing speed in second case is that you using prepared statement which is already parsed(but maybe I wrong:)).

Stepdame answered 14/2, 2012 at 3:3 Comment(1)
Statement preparation is actually very fast and is only an issue if you are doing it a very large number of times. In this case only one query is prepared for a multi-million row query, so the effect (even if it were a factor) would be negligible.Howund
C
-1

Ahh, I see - I will drop out after this comment since I don't know Perl. But I would trust that the editor is correct in highlighting the $q as a constant. I'm guessing that you need to concatenate the value into the string, rather than just directly referencing the variable. So, my guess is that if + is used for string concatenation in perl, then use something like:

my $sql = qq{ SELECT a, b, c FROM t WHERE Lower( a ) LIKE ' } + $q + qq{'};

(Note: unless the language is tightly integrated with the database, such as Oracle/PLSQL, you usually have to create a completely valid SQL string before submitting to the database, instead of expecting the compiler to 'interpolate'/'Substitute' the value of the variable.)

I would again suggest that you get the COUNT() of the statements, to make sure that you are comparing apple to apples.

Corny answered 14/2, 2012 at 3:28 Comment(3)
Perl's concatenation operator is ., and qq{ SELECT a, b, c FROM t WHERE Lower( a ) LIKE '$q'} and qq{ SELECT a, b, c FROM t WHERE Lower( a ) LIKE '} . $q . qq{'}; are identical.Irregularity
If you don't know Perl, you should listen to those that do (mu, TLP).Irregularity
@klofton: Sorry if I sound miserable, but as your answer is something of a red herring, I would suggest considering deleting it entirely.Howund
C
-2

I don't know Postgres at all, but I think in Line 7 (WHERE Lower( a ) LIKE '$q' ), $q is actually a constant. It looks like your editor thinks so too, since it is highlighted in red. You probably still need to use the ? for the variable.

To test, do a COUNT(*), and make sure they match - I could be way offbase.

Corny answered 14/2, 2012 at 1:27 Comment(6)
No, Perl's qq{...} acts like a double quoted string so $q will be interpolated.Commentative
It is not a constant. The statement is wrapped in qq{}, so variables are interpolated.Undoubted
or, maybe try just removing the single-quotes around $q, and see if it compiles (this is how Oracle PL/SQL would process it, but not sure how scoping works in postgres).Corny
it will compile fine, but it will be a completely different query. Without the single quotes it will be trying to do a LIKE against a column with the same name as the value of $q.Recur
The value of $q is interpolated into the query because the string literal is a qq{} (interpolating) string literal. The single quotes either side of the $q inside the string do not stop $q being interpolated.Howund
@klofton: Again, complete red herring. I would suggest considering deleting the answer entirely.Howund

© 2022 - 2024 — McMap. All rights reserved.