Is NATURAL (JOIN) considered harmful in production environment?
Asked Answered
K

6

13

I am reading about NATURAL shorthand form for SQL joins and I see some traps:

  • it just takes automatically all same named column-pairs (use USING to specify explicit column list)
  • if some new column is added, then join output can be "unexpectedly" changed too, which may be not so obvious (even if you know how NATURAL works) in complicated structures
Kinswoman answered 18/5, 2011 at 4:1 Comment(0)
C
22

NATURAL JOIN syntax is anti-pattern:

  • The purpose of the query is less obvious;
    • the columns used by the application is not clear
    • the columns used can change "unexpectedly"
  • The syntax goes against the modularity rule, about using strict typing whenever possible. Explicit is almost universally better.

Because of this, I don't recommend the syntax in any environment.
I also don't recommend mixing syntax (IE: using both NATURAL JOIN and explicit INNER/OUTER JOIN syntax) - keep a consistent codebase format.

Curl answered 18/5, 2011 at 4:7 Comment(17)
+1 for query is less obvious. Reading is way harder than writing.Eikon
"syntax goes against the modularity rule" -- I think you can charge the entire SQL language with that. However, if you take up that challenge of using SQL while adhering to relational principles as closely as possible, NATURAL JOIN positively shines.Neology
@onedaywhen: The entire SQL language? The question isn't about relational principles -- it's about Production-level code considerations. To me, that means: maintainance, readability, consistency, performance. It's pretty obvious you didn't have a reason so much as *created one to justify downvoting.Curl
I downvoted because you suggested that NATURAL JOIN is an "anti-pattern" without proving it is to my satisfaction. If you use SQL relationally then NATURAL JOIN is probably the only one you need so calling it an "anti-pattern" sounds absurd to me.Neology
@onedaywhen: Your "satisfaction" doesn't reflect the question -- the OP did not ask about "relational" sense (frankly, I don't see the value). "Pattern" and "Anti-pattern" is common terminology when discussing software design, which is what the OP asked about. Sometimes you have a point - this is not one of those occasions and there are far less correct answers but they haven't been downvoted...Curl
NATURAL JOIN is part of the relational algebra so talking about it in the "relational sense" is very much on topic. I don't consider using a relational operator in a supposedly relation language to be a "design pattern", let alone an "anti-pattern".Neology
I see your whole point. Yeah, I wrote "in production environment" intentionally, so I agree in that sense we can consider NATURAL as anti-pattern, because it's tricky tu use.Kinswoman
@onedaywhen: "Anti-pattern/pattern" has been used to describe single vs multiple return statements in a method/function, among other finely detailed programming design aspects. And now the OP concedes they agree with my assessment - you grasp at straws.Curl
Consider that the truly relational language Tutorial D has but one join being NATURAL JOIN... what's that? The OP has concede it's an "anti-pattern, because it's tricky tu use"?! You're right, I have nothing.Neology
@onedaywhen: Is this a Tutorial D question? The OPs concession reinforces why not to use the syntax (which my bullet points expand on) -- nothing to do with your "relational principles". Every reply you make confirms my original supposition. What a sad, angry person you must be.Curl
It is a question about relational operators, but like most people you can't see the trees for the wood, no biggie. Sad/angry couldn't be further from the truth, I assure you ;)Neology
@onedaywhen: That's hilarious considering the defined scope of the question. I forgot that "delusional" should be in that list.Curl
Spare me 10 seconds to peruse the contents section of Wikipedia's article on relational algebra. Notice some SQL keywords jump out at you (e.g. NATURAL JOIN)? This is because SQL is based on relational algebra (some parts more loosely than others). Your eye may even catch that "natural join is arguably one of the most important operators since it is the relational counterpart of logical AND". Using AND in logic is neither a pattern nor an anti-pattern (far from it) and the same applies to natural join in a supposedly relational language.Neology
From the faq: "Above all, be honest. If you see misinformation, vote it down. Add comments indicating what, specifically, is wrong." This I've done and will now be moving on. "Be nice" :)Neology
@onedaywhen: I take issues with the Wikipedia article on that, illustrated by the blunder under theta-join description "Price=Price" - which shows how traditionally relational algebra doesn't think much about several tables sharing an attribute name, while being distinct in meaning. You can always rewrite a NATURAL JOIN with an INNER JOIN, but there are many cases when you can't implement an INNER JOIN using a NATURAL JOIN. This makes NATURAL JOIN a syntactic sugar, at best. It works in maths, since in maths you assume that same name always shares the same meaning.Merwyn
@onedaywhen: I've consistently demonstrated you don't really have any reasons that relate to the framed question for why to downvote, nevermind you have yet to take issue with any other answer. No one else is agreeing with you, and more are involving themselves to take issue with your stance on the matter. You appear to hate high-rep accounts, but aren't willing to do anything to improve your rep -- where's your answer? Downvotes can be reversed, but you'd rather be obtuse -- you like my attention. You're "moving on" because on some level you finally comprehend your behavior.Curl
@OMG Ponies: (sorry for noise comment) Well said. I've had the same treatment from onedaywhen: all criticism, nothing constructive. Sounds like an academic who doesn't actually use SQL day to day, just likes reading books...Refluent
M
7

One thing that completely destroys NATURAL for me is that most of my tables have an id column, which are obviously semantically all different. You could argue that having a user_id makes more sense than id, but then you end up writing things like user.user_id, a violation of DRY. Also, by the same logic, you would also have columns like user_first_name, user_last_name, user_age... (which also kind of makes sense in view that it would be different from, for example, session_age)... The horror.

I'll stick to my JOIN ... ON ..., thankyouverymuch. :)

Merwyn answered 18/5, 2011 at 4:12 Comment(0)
L
6

These "traps", which seem to argue against natural joins, cut both ways. Suppose you add a new column to table A, fully expecting it to be used in joining with table B. If you know that every join of A and B is a natural join, then you're done. If every join explicitly uses USING, then you have to track them all down and change them. Miss one and there's a bug.

Use NATURAL joins when the semantics of the tables suggests that this is the right thing to do. Use explicit join criteria when you want to make sure the join is done in a specific way, regardless of how the table definitions might evolve.

Longwood answered 18/5, 2011 at 4:6 Comment(3)
No doubt that NATURAL is more "hidden-complicated" than strict USING or general ON clause (that's some reason to avoid it), but as you wrote it depends on semantics. I suppose that in most cases I want to use USING or ON, to be sure that specific way.Kinswoman
Yes, I think that in most (but not all) cases, being explicit is better.Longwood
No, natural joins are obscure, error prone, and difficult to debug. Avoid them like the plague.Durwood
B
3

I agree with the other posters that an explicit join should be used for reasons of clarity and also to easily allow a switch to an "OUTER" join should your requirements change.

However most of your "traps" have nothing to do with joins but rather the evils of using "SELECT *" instead of explicitly naming the columns you require "SELECT a.col1, a.col2, b.col1, b.col2". These traps occurs whenever a wildcard column list is used.

Belittle answered 18/5, 2011 at 4:15 Comment(0)
T
2

Adding an extra reason not listed in any of the answers above. In postgres (not sure if this the case for other databases) if no column names are found in common between the two tables when using NATURAL JOIN then a CROSS JOIN is performed. This means that if you had an existing query and then you were to subsequently change one of the column names in a table, you would still get a set of rows returned from the query rather than an error. If instead you used the JOIN ... USING(...) syntax you would get an error if the joining column was no longer there.

The postgres documentation has a note to this effect:

Note: USING is reasonably safe from column changes in the joined relations since only the listed columns are combined. NATURAL is considerably more risky since any schema changes to either relation that cause a new matching column name to be present will cause the join to combine that new column as well.

Thiazole answered 27/10, 2016 at 17:21 Comment(1)
I would have expected PostgreSQL to bomb the query if no matching columns are present in a natural join.Durwood
T
-3

Do you mean the syntax like this:

SELECT * 
  FROM t1, t2, t3 ON t1.id = t2.id 
                 AND t2.id = t3.id

Versus this:

         SELECT *  
           FROM t1 
LEFT OUTER JOIN t2 ON t1.id = t2.id 
                  AND t2.id = t3.id

I prefer the 2nd syntax and also format it differently:

         SELECT *
           FROM T1
LEFT OUTER JOIN T2 ON T2.id = T1.id
LEFT OUTER JOIN T3 ON T3.id = T2.id

In this case, it is very clear what tables I am joining and what ON clause I am using to join them. By using that first syntax is just too easy to not put in the proper JOIN and get a huge result set. I do this because I am prone to typos, and this is my insurance against that. Plus, it is visually easier to debug.

Tsang answered 18/5, 2011 at 4:10 Comment(3)
No - the first example is ANSI-89 syntax, the latter is ANSI-92. NATURAL JOIN is also ANSI-92 syntax. Pardon, I thought the first was ANSI-89, but it's a hybrid of the two I doubt is supported on anything but MySQL.Curl
Thanks, but my doubt is about NATURAL (implicit "full" column matching list). I mean something like SELECT * FROM t1 NATURAL LEFT JOIN t2;Kinswoman
Oh interesting, I have not seen that keyword NATURAL before. Sorry for the wasted answer, but thank you for the new keyword.Tsang

© 2022 - 2024 — McMap. All rights reserved.