SQL join following foreign key: statically check that LHS is key-preserved
Asked Answered
W

1

2

Often you join two tables following their foreign key, so that the row in the RHS table will always be found. Adding the join does not affect the number of rows affected by the query. For example

create table a (x int not null primary key)
create table b (x int not null primary key, y int not null)
alter table a add foreign key (x) references b (x)

Now, assuming you set up some data in these two tables, you can get a certain number of rows from a:

select x from a

Adding a join to b following the foreign key does not change this:

select a.x from a join b on a.x = b.x

However, that is not true of joins in general, which may filter out some rows or (by Cartesian product) add more:

select a.x from a join b on a.x = b.x and b.y != 42 -- probably gives fewer rows

select a.x from a join b on a.x != b.y -- probably gives more rows

When reading SQL code there is no obvious way to tell whether a join is the key-preserving kind, which may add extra columns but does not change the number of rows returned, or whether it has other effects. Over time I have developed a coding convention which I mostly stick to:

  • if a key-preserving join, use join
  • if wanting to filter rows, put the filter condition in the where clause
  • if wanting more rows, sometimes cross join for Cartesian product is the clearest way

These are usually just style issues, since you can often put a predicate into either the join clause or the where clause, for example.

My question

Is there some way to have these key-preserving joins statically checked by the database server when the query is compiled? I understand that the query optimizer already knows that a join on a foreign key will always find exactly one row in the table pointed to by the foreign key. But I would like to tag it in my SQL code for the benefit of human readers. For example, suppose the new syntax fkjoin is used for a join following a foreign key. Then the following SQL fragments will give errors or not:

a fkjoin b on a.x = b.x -- OK

a fkjoin b on a.x = b.x and b.y = 42 -- "Error, join can fail due to extra predicate"

a fkjoin b on a.x = b.y -- "Error, no foreign key from a.x to b.y"

This would be a useful check for me when writing the SQL, and also when returning to read it later. I understand and accept that changing the foreign keys in the database would change what SQL is legal under this scheme - to me, that is a desired outcome, since if a necessary FK ceases to exist then the key-preserving semantics of the query are no longer guaranteed, and I'd like to find out about it.

Potentially, there could be some external SQL static checker tool that does the work, and special comment syntax could be used rather than a new keyword. The checker tool would need access to the database schema to see what foreign keys exist, but it would not need to actually execute the query.

Is there something that does what I want? I am using MSSQL 2008 R2. (Microsoft SQL Server for the pedantic)

Whaling answered 22/7, 2014 at 10:52 Comment(3)
If I understand the question correctly, consistent naming convention may help in alleviating some or all of these issues. If Customers is being joined with Orders on Customers.CustomerID = Orders.CustomerID, you can tell at a glance that the condition makes sense, while joining on Customers.CategoryID = Orders.ProductID obviously doesn't seems right.Unreasonable
What exactly do you mean by a join "succeeding" or "failing"? Since you can join any two tables.Bailar
@Andriy M, yes a naming convention can help. But it's not easy to keep track of things unless you are an infallible programming robot - I make too many mistakes to rely on my own carefulness alone. For example, if you have a query from customers, then adding join orders on customers.customerid = orders.customerid is not a safe change by this measure - since it will now exclude customers that have no order, and make two rows for a customer with two orders. So I could not add or remove such a join without affecting query semantics. (The other way round is safe.)Whaling
B
1

I realize that you are interested in indicating whether a particular join on particular columns is on a FK, or is a restriction, or perhaps is of some other case, or none of the preceding. (And it's not clear what you mean by "success" or "failure" of a join, or its relevance.) Whereas focusing on that information, as explained below, is to miss focusing on more important and fundamental things.

A base table has a "meaning" or "predicate (expression)" that is a fill-in-the-(named-)blanks statement given by the DBA. The names of the blanks of the statement are the columns of the table. Rows that fill in the blanks to make a true proposition about the world go in the table. Rows that fill in the blanks to make a false proposition about the world are left out. Ie a table holds the rows that satisfy its statement. You cannot set a base table to a certain value without knowing its statement, observing the world and putting the appropriate rows into the table. You cannot know about the world from base tables except by knowing its statement and taking present-row propositions to be true and absent-row propositions to be false. Ie you need its statement to use the database.

Notice that the typical syntax for a table declaration looks like a shorthand for its statement:

-- employee [eid] is named [name] and lives at [address] in ...
EMPLOYEE(eid,name,address,...)

You can make bigger statements by putting logic operators AND, OR, AND NOT, EXISTS name, AND condition, etc between/around other statements. If you translate a statement to a relation/SQL expression by converting

  • a table's statement to its name
  • AND to JOIN
  • OR to UNION
  • AND NOT to EXCEPT/MINUS
  • EXISTS C,... [...] to SELECT all columns but C,... FROM ...
  • AND condition to ON/WHERE condition
  • IMPLIES to SUBSETOF
  • IFF to =

then you get a relation expression that calculates the rows that make the statement true. (Arguments of UNION & EXCEPT/MINUS need the same columns.) So just as every table holds the rows satisfying its statement, a query expression holds the rows that satisfy its statement. You cannot know about the world from a query result except by knowing its statement and taking its present-row propositions to be true and absent-row propositions to be false. Ie you need its statement to compose or interpret a query. (Observe that this is true regardless of what constraints hold.)

This is the foundation of the relational model: table expressions calculate rows satisfying corresponding statements. (To the extent that SQL differs, it is literally illogical.)

Eg: If table T holds the rows that make statement T(...,T.Ci,...) true and table U holds the rows that make statement U(...,U.Cj,...) true then table T JOIN U holds the rows that make statement T(...,T.Ci,...) AND U(...,U.Cj,...) true. That is the semantics of JOIN that is important to using a database. You can always join, and a join always has a meaning, and it is always the AND of its operands' meanings. Whether any tables happen to have FKs to others just isn't particularly helpful for reasoning about updates or queries. (The DBMS uses constraints for when you make mistakes.)

A constraint expression just corresponds to a proposition aka always-true statement about the world and simultaneusly to one about base tables. Eg for C UNIQUE NOT NULL in U, the following three expressions are equivalent to each other:

  • FOREIGN KEY T (C) REFERENCES U (C)
  •     EXISTS columns other than C T(...,C,...)
    IMPLIES EXISTS columns other than C U(...,C,...)
  • (SELECT C FROM T) SUBSETOF (SELECT C FROM U)

It is true that this implies that SELECT C FROM T JOIN U ON T.C = U.C = SELECT C FROM U, ie a join on a FK returns the same number of rows. But so what? The join's meaning is still the same function of its arguments'.

Whether a particular join on a particular column set involves a foreign key is just not germane to understanding the meaning of a query.

Bailar answered 23/7, 2014 at 11:55 Comment(1)
You are right to note the correspondence between relational operators and logical operators. Put in those terms, I would like a way to see whether a join condition is a tautology. If a.x is a foreign key to b.x, then adding 'join b on a.x = b.x' to a query makes no difference to its set of rows returned, just as if you added 'where 1+1 = 2'. They are both always-true conditions. When programming, I would like to assert that the condition I have added must always be true, and have this statically verified.Whaling

© 2022 - 2024 — McMap. All rights reserved.