This is a non-trivial exercise in most cases. If you are lucky enough to be analysing a schema for a modern framework, such as Ruby on Rails, or CakePHP or similar, and the developers have been stringent about following column conventions, then you have a reasonable chance of finding many, but not all, of the implied relationships.
I.e. if your tables use columns like user_id
to refer to entries in the users
tables.
Be aware: some entity names may pluralise irregularly (entity
being a good example: entities
, not entitys
) and these are harder to catch (but still possible). However, keys such as admin_id
which the developers join with the users table on user.id
can't be inferred. You would need to handle those cases manually.
You didn't specify an RDBMS, but I used MySQL a lot, and I'm currently working on this problem for myself.
The following MySQL script will infer most relationships implied by column names. It then lists any relationships that it could not find table names for, so at least you know which ones you're missing. The inferred parent and child are listed, along with singular and plural names, plus the implied relationship:
-- this DB is where MySQL keeps schema information
use information_schema;
-- change this to the DB you want to analyse
set @db_name = "example_DB";
-- infer relationships
-- NB: this won't catch names that pluralise irregularly like category -> categories or bus_id -> buses etc.
select LEFT(COLUMN_NAME, CHAR_LENGTH(COLUMN_NAME) - 3 ) as inferred_parent_singular
, CONCAT(LEFT(COLUMN_NAME, CHAR_LENGTH(COLUMN_NAME) - 3 ),"s") as inferred_parent_plural
, C.TABLE_NAME as child_table
, CONCAT(LEFT(COLUMN_NAME, CHAR_LENGTH(COLUMN_NAME)-3), "s has many ", C.TABLE_NAME) as inferred_relationship
from COLUMNS C
JOIN TABLES T on C.TABLE_NAME = T.TABLE_NAME
and C.TABLE_SCHEMA = T.TABLE_SCHEMA
and T.TABLE_TYPE != "VIEW" -- filter out views; comment this line if you want to include them
where COLUMN_NAME like "%_id" -- look for columns of the form <name>_id
and C.TABLE_SCHEMA = T.TABLE_SCHEMA and T.TABLE_SCHEMA = @db_name
-- and C.TABLE_NAME not like "wwp%" -- uncomment and set a pattern to filter out any tables you DON'T want included, e.g. wordpress tables e.g. wordpress tables
-- finally make sure to filter out any inferred names that aren't really tables
and CONCAT(LEFT(COLUMN_NAME, CHAR_LENGTH(COLUMN_NAME) - 3 ),"s") -- this is the inferred_parent_plural, but can't use column aliases in the where clause sadly
in (select TABLE_NAME from TABLES where TABLE_SCHEMA = @db_name)
;
This will return results like this:
Then you can examine any naming convention exceptions detected with:
-- Now list any inferred parents that weren't real tables to see see why (irregular plurals and columns not named according to convention)
select LEFT(COLUMN_NAME, CHAR_LENGTH(COLUMN_NAME) - 3 ) as inferred_parent_singular
, CONCAT(LEFT(COLUMN_NAME, CHAR_LENGTH(COLUMN_NAME) - 3 ),"s") as inferred_parent_plural
, C.TABLE_NAME as child_table
from COLUMNS C
JOIN TABLES T on C.TABLE_NAME = T.TABLE_NAME
and C.TABLE_SCHEMA = T.TABLE_SCHEMA
and T.TABLE_TYPE != "VIEW" -- filter out views, comment this line if you want to include them
where COLUMN_NAME like "%_id"
and C.TABLE_SCHEMA = T.TABLE_SCHEMA and T.TABLE_SCHEMA = @db_name
-- and C.TABLE_NAME not like "wwp%" -- uncomment and set a pattern to filter out any tables you DON'T want included, e.g. wordpress tables e.g. wordpress tables
-- this time only include inferred names that aren't real tables
and CONCAT(LEFT(COLUMN_NAME, CHAR_LENGTH(COLUMN_NAME) - 3 ),"s")
not in (select TABLE_NAME from TABLES where TABLE_SCHEMA = @db_name)
;
This will return results like this, which you can process manually:
You can modify these scripts to spit out whatever is useful to you, include foreign key create statements, if you want to. Here, the final column is a simple 'has many' relationship statement. I use this in a tool I've built called called pidgin, which is rapid modelling tool that draws relationship diagrams on the fly based on relationship statements written a very simple syntax (called 'pidgin'). You can check it out at http://pidgin.gruffdavies.com
I've run the above script on a demo DB to show you the sort of results you can expect:
I haven't catered for the irregular plurals in my script, but I might have a go at that too, at least for the case of entities ending in -y. If you want to have a try at that yourself, I'd recommend writing a stored function that takes <name>_id
column names as a parameter, strips the _id
part and then applies some heuristics to attempt to pluralise correctly.
Hope that's useful!