How to implement composable queries: the 10k feet view
It's not difficult to realize that in order to achieve this the methods being chained must incrementally set up some data structure which is finally being interpreted by some method that executes the final query. But there are some degrees of freedom regarding exactly how this can be orchestrated.
The example code is
$albums = $db->select('albums')->where('x', '>', '20')->limit(2)->order('desc');
What do we see here?
- There is some type, which
$db
is an instance of, that exposes at least a select
method. Note that if you want to be able to fully reorder the calls this type needs to expose methods with all of the possible signatures that can take part in the call chain.
- Each of the chained methods returns an instance of something that exposes methods will all relevant signatures; this may or may not be the same type as
$db
.
- After the "query plan" has been collected, we need to call some method to actually execute it and return the results (a process which I am going to call materializing the query). This method can only be the last one in the call chain for obvious reasons, but in this case the last method is
order
, which does not seem right: we want to be able to move it earlier in the chain after all. Let's keep this in mind.
Therefore we can break down what happens in three distinct steps.
Step 1: Kicking off
We established that there needs to be at least one type that collects information about the query plan. Let's assume the type looks like this:
interface QueryPlanInterface
{
public function select(...);
public function limit(...);
// etc
}
class QueryPlan implements QueryPlanInterface
{
private $variable_that_points_to_data_store;
private $variables_to_hold_query_description;
public function select(...)
{
$this->encodeSelectInformation(...);
return $this;
}
// and so on for the rest of the methods; all of them return $this
}
QueryPlan
needs appropriate properties to remember not only what query it should produce, but also where to direct that query because it is an instance of this type you will have on hand at the end of the call chain; both pieces of information are necessary in order for the query to be materialized. I have also provided a QueryPlanInterface
type; its significance will be made clear later on.
Does this mean that $db
is of type QueryPlan
? At first sight you might say yes, but upon closer inspection issues start to arise from such an arrangement. The biggest problem is stale state:
// What would this code do?
$db->limit(2);
// ...a little later...
$albums = $db->select('albums');
How many albums is this going to retrieve? Since we did not "reset" the query plan it should be 2. But that's not obvious at all from the last line, which reads very differently. This is a bad arrangement that can lead to unnecessary bugs.
So how to solve this problem? One option would be for select
to reset the query plan, but this runs into the opposite issue: $db->limit(1)->select('albums')
now selects all albums. This doesn't look nice.
The option would be to "kick off" the chain by arranging for the first call to return a new QueryPlan
instance. This way each chain operates on a separate query plan, and while you can compose a query plan bit by bit you can no longer do it by accident. So you could have:
class DatabaseTable
{
public function query()
{
return new QueryPlan(...); // pass in data store-related information
}
}
which solves all these problem but requires you to always write ->query()
in front:
$db->query()->limit(1)->select('albums');
What if you don't want to have this extra call? In that case class DatabaseTable
has to implement QueryPlanInterface
as well, with the difference that the implementation will create a new QueryPlan
each time:
class DatabaseTable implements QueryPlanInterface
{
public function select(...)
{
$q = new QueryPlan();
return $q->select(...);
}
public function limit(...)
{
$q = new QueryPlan();
return $q->limit(...);
}
// and so on for the rest of the methods
}
You can now write $db->limit(1)->select('albums')
without any problem; the arrangement can be described as "each time you write $db->something(...)
you start composing a new query that is independent of all previous and future ones".
Step 2: Chaining
This is actually the easiest part; we already saw how the methods in QueryPlan
always return $this
to enable chaining.
Step 3: Materializing
We still need some way to say "OK, I 'm done composing; get me the results". It is perfectly possible to use a dedicated method for this purpose:
interface QueryPlanInterface
{
// ...other methods as above...
public function get(); // this executes the query and returns the results
}
This enables you to write
$anAlbum = $db->limit(1)->select('albums')->get();
There is nothing wrong and lots of right with this solution: it's obvious at which point the actual query is executed. But the question uses an example that does not appear to work like that. Is it possible to achieve such syntax?
The answer is yes and no. Yes in that it is indeed possible, but no in the sense that the semantics of what happens will have to change.
PHP has no facility that enables a method to be "automatically" called, so there has to be something that triggers the materialization, even if that something does not look like a method call at first sight. But what? Well, think about what is perhaps the most common use case:
$albums = $db->select('albums'); // no materialization yet
foreach ($albums as $album) {
// ...
}
Can this be made to work? Sure, as long as QueryPlanInterface
extends IteratorAggregate
:
interface QueryPlanInterface extends IteratorAggregate
{
// ...other methods as above...
public function getIterator();
}
The idea here is that the foreach
triggers a call to getIterator
, which in turn will create an instance of yet another class injected with all the information that the implementation of QueryPlanInterface
has compiled. This class will execute the actual query on the spot and materialize the results on demand during the iteration.
I have chosen to implement IteratorAggregate
and not Iterator
specifically so that the iteration state can go into a new instance, which allows multiple iterations over the same query plan to go on in parallel without problems.
Finally, this foreach
trick looks neat but what about the other common use case (getting the query results into an array)? Have we made that unwieldy?
Not really, thanks to iterator_to_array
:
$albums = iterator_to_array($db->select('albums'));
Conclusion
Does this require lots of code to be written? For sure. We have DatabaseTable
, QueryPlanInterface
, QueryPlan
itself and also the QueryPlanIterator
we have described but not shown. In addition, all the encoded state that these classes aggregate will probably need to be kept in instances of yet more classes.
Is it worth it? Quite likely. That's because this kind of solution offers:
- an attractive fluent interface (chainable calls) with clear semantics (each time you kick off you start describing a new query independent of any other)
- decoupling of the query interface from the data store (each instance of
QueryPlan
keeps a handle on an abstract data store, so you can theoretically query anything from relational databases to flat text files using the same syntax)
- composability (you can start composing a
QueryPlan
now and continue doing so in the future, even in another method)
- reusability (you can materialize each
QueryPlan
more than once)
Not a bad package at all.
where()
limit() and
order()` are being performed in PHP on the result set returned byselect()
. If you want to build the entire query and avoid result-set processing at the PHP level, you would want to add an additional, required method to the end of the chain, such as->execute()
. The optional methods would build the query created byselect()
, andexecute()
would validate the query, execute it, and return the results. – Sunbonnetget()
is added to the end of query builder strings, much like the manner I described forexecute()
above. – Sunbonnet