Often you join two tables following their foreign key, so that the row in the RHS table will always be found. Adding the join does not affect the number of rows affected by the query. For example
create table a (x int not null primary key)
create table b (x int not null primary key, y int not null)
alter table a add foreign key (x) references b (x)
Now, assuming you set up some data in these two tables, you can get a certain number of rows from a:
select x from a
Adding a join to b following the foreign key does not change this:
select a.x from a join b on a.x = b.x
However, that is not true of joins in general, which may filter out some rows or (by Cartesian product) add more:
select a.x from a join b on a.x = b.x and b.y != 42 -- probably gives fewer rows
select a.x from a join b on a.x != b.y -- probably gives more rows
When reading SQL code there is no obvious way to tell whether a join
is the key-preserving kind, which may add extra columns but does not change the number of rows returned, or whether it has other effects. Over time I have developed a coding convention which I mostly stick to:
- if a key-preserving join, use
join
- if wanting to filter rows, put the filter condition in the
where
clause - if wanting more rows, sometimes
cross join
for Cartesian product is the clearest way
These are usually just style issues, since you can often put a predicate into either the join
clause or the where
clause, for example.
My question
Is there some way to have these key-preserving joins statically checked by the database server when the query is compiled? I understand that the query optimizer already knows that a join on a foreign key will always find exactly one row in the table pointed to by the foreign key. But I would like to tag it in my SQL code for the benefit of human readers. For example, suppose the new syntax fkjoin
is used for a join following a foreign key. Then the following SQL fragments will give errors or not:
a fkjoin b on a.x = b.x -- OK
a fkjoin b on a.x = b.x and b.y = 42 -- "Error, join can fail due to extra predicate"
a fkjoin b on a.x = b.y -- "Error, no foreign key from a.x to b.y"
This would be a useful check for me when writing the SQL, and also when returning to read it later. I understand and accept that changing the foreign keys in the database would change what SQL is legal under this scheme - to me, that is a desired outcome, since if a necessary FK ceases to exist then the key-preserving semantics of the query are no longer guaranteed, and I'd like to find out about it.
Potentially, there could be some external SQL static checker tool that does the work, and special comment syntax could be used rather than a new keyword. The checker tool would need access to the database schema to see what foreign keys exist, but it would not need to actually execute the query.
Is there something that does what I want? I am using MSSQL 2008 R2. (Microsoft SQL Server for the pedantic)
Customers
is being joined withOrders
onCustomers.CustomerID = Orders.CustomerID
, you can tell at a glance that the condition makes sense, while joining onCustomers.CategoryID
=Orders.ProductID
obviously doesn't seems right. – Unreasonablecustomers
, then addingjoin orders on customers.customerid = orders.customerid
is not a safe change by this measure - since it will now exclude customers that have no order, and make two rows for a customer with two orders. So I could not add or remove such a join without affecting query semantics. (The other way round is safe.) – Whaling