Consolidating tables with one-to-one relationships
Asked Answered
B

7

8

I have 3 MySQL tables for a membership system.

  • users: Minimum requirement to be a user, only related to account info (email, password, is_activated, etc)
  • user_profiles: Personal information provided by the user (name, address, phone...)
  • user_member_profiles: Information strictly managed by admins (registration fee paid, meetings attended, etc.)

These could be condensed into one table, saving me headaches and keeping my code clean - but I feel like it's better to leave them separate as they serve slightly different purposes.

Option 1: Leave it this way and keep doing JOINs and tedious UPDATEs (this piece of data goes to this table, this piece goes to another, etc.). More work for me, but maybe it makes more sense?

Option 2: Combine everything into one table.

I would assume using one table would be faster, no need to join tables. Maybe this depends on the data? Each table has about 12-20 fields, so the combined table would be large.

Each user has no more than 1 profile from each table, but may not even have a profile at all (or may have only 1 total).

To add a little context to this: It is for an ever-evolving CMS written in PHP which I will need to make tweaks to the tables for each installation. Admins need to manage members in a speadsheet-like way, so I will be selecting up to 200 users at a time.

What is the correct approach to this from a performance, design, or organization perspective?

Bursarial answered 27/4, 2011 at 21:20 Comment(6)
You should always only select what you need no matter what the table structure! I fail to see how these few joins would be a burden.Aretina
Part of option 2, "just make sure to SELECT only what I need", seems to be in conflict with "I also need to SELECT * FROM each profile table every time because I need every field available."Continually
@Catcall: I meant when I need data from one profile, I need all of it, but I don't always need data from any profile.Bursarial
@HLGEM: Yes I understand, but I am seeing that the JOIN statements are taking more time than a simple query from one table, and it makes it harder to manage doing UPDATEs (from html forms in the application) as I have to cherry pick each field and make sure it goes to the right profile table.Bursarial
If your join statments are slow, then you need to make sure the correct indexes are in place. Databases are optimzed to do joins but you have to do the work to set things up correctly.Aretina
I didn't say they were slow, I only suggested that avoiding them could be faster.Bursarial
A
15

Another factor to consider with wide table (many columns) is the effect on the RDBMS cache. Any good developer knows that you don't do 'select * from table' as it will carry unnecessary data over the network from RDBMS to client. But a similar effect can happen betwen disk and RAM and also affect the amount of space in RAM that a table requires to cache.

Most RDBMSes allocate a given volume of memory to cache data, thus reducing physical disk reads and speeding response to the user. This is Buffer Cache in Oracle or SQL Server

If you have a wide table and issue a query in the form 'select col1, col2, col3 from table' the RDBMS will load the full rows into RAM (not col1 through 3). As it does so it will age out older cached data. If your table is wide and you load 50 columns you of course require more RAM than for the same number of rows * a narrow table. This can have a noticeable impact on RDBMS performance.

Lots of wide tables, aging out other tables from cache and it is possible to see the IO stats go thorough the roof as commonly used tables age out of cache to make room for wide tables.

This factor should be added to the other advantages of normalised data and taken into consideration at table design time. In effect if you have a potentially wide table with some data that will be regularly accessed and some that will be rarely, consider multiple tables with a 1 to 1 relationship.

Artois answered 27/4, 2011 at 22:59 Comment(2)
You bring up some interesting technical points I was not aware of: SELECT * FROM several tables being faster than SELECT col1,col2,col3 from a single table. What would you consider the maximum amount of columns to be sensible before you look for ways to separate the data into different tables? (Assuming that this is possible and the data is not very tightly related)Bursarial
After re-reading your answer, its making more sense. If there was a comma here: rarely**,** consider, it would have been a lot clearer. I read it as "rarely consider". Thanks a lot for this advice, we'll be sticking with the original design, as I expected. I thought perhaps I was being too "defensive" by splitting up the data this way, and someone would come along and say "There's no need for this."Bursarial
A
3

The design issue is whether you need to have multiple records in any of those tables for one user. If so, do not combine them. If the tables are in a one-to-one realtionship, you can combine them, but should not if they have many fields or your record size will be too wide which can cause performance problems as well as making it impossible to add data if you exceed the actual record size limit for a single record. If you currently have a lot of code that access them as serarate tables and a lot of data, restructuring them for the minor gain you would get (saving all of a minute or so in development and probably no time at all inperformance to the users) seems to be a bad idea. YOu could write views so you don't have to do the joins, but honestly these are so simple, I wouldn't bother there either.

Aretina answered 27/4, 2011 at 21:57 Comment(4)
saving all of a minute or so in development: This is actually important to me in terms of code clarity and simplification, and will actually end up being more than a minute or two as things evolve and I need to, for instance, add more fields or manage them in a different way. probably no time at all inperformance to the users: There are times when I need to for instance, show hundreds of users (for management) so I am actually interested in getting better performance. Are you saying that views are not worth the time/effort or that this is not a good reason to use them?Bursarial
I'm saying views are useful for complex relationships not simple ones like this. I also tend to avoid views because people go crazy with them and create views that call other views that can create horrible performance problems. This is just not that hard a scenario that creating a view would save any one anything and it just adds an unneeded level of abstraction for basically no gain.Aretina
OK, I understand now (and agree) thanks. This is very simple for sure. Basically, there are 4 scenarios: Access to table1, table1+table2, table1+table3, or table1+2+3. I always need all data from a table that is accessed. I'm wondering if I should leave as is and SELECT * with JOIN, or put everything in one table and just select the columns I need for each of those 4 instances. It sounds like you're advice is to leave things the way I have it, is that correct?Bursarial
Yes except I would never use select *, you need to specify the columns. Select * is a very poor programming technique and should not be used in production code. It is a sloppy, bad habit to get into. Select * can cause bad errors when structure changes (someitmes someone adds a column you don't want users to see for instance or it messes up an insert statement to another table) and it causes performance problems. When you have a join you have a at least one column with the same data in it that you are returning twice, this is wasteful of network and server resources.Aretina
P
2

My design urges say keep separate because maybe in the future a user will have two profiles, but performance is likely better if they are merged. If there is truly a one-to-one relationship, and that relationship will never change, then I would merge them.

Phototypy answered 27/4, 2011 at 21:24 Comment(5)
Yeah. The First form appears to be better normalization wise. The second will be faster if you need columns from the 3 tables. If your main goal is to simplify queries, create a view.Golda
Hadn't thought of a view. That would certainly help.Phototypy
It is safe to say that users will never have two profiles, but some users will only need one. Another concern is that the table will have at least 50 columns, which with my limited knowledge smells like bad design; but then again: You gotta do what you gotta do. The concept of a "view" is beyond my full understanding right now, would this be a good solution or just simply another option?Bursarial
I have worked with tables with more than 40 columns out of simple necessity. A view is usually designed to give certain users certain access to certain data, but this is not the only way they can be used. For you, it's probably just another option.Phototypy
Please see the answer I added below regards the performance impact of wide tables. In addition to Deletion Insert and Update anomaly consider performance.Artois
E
2

You don't have to use that many joins for retrieving data.

You can have a VIEW to show for example all columns from users and user_profiles:

CREATE VIEW users2 AS
( SELECT u.id
       , u.email
       , u.password
       , u.is_activated
       , p.name
       , p.address
       , p.phone
  FROM users u
    LEFT JOIN user_profiles p
      ON u.id = p.id
)

and use this VIEW in the queries that need data from both tables. Another VIEW for all 3 tables, etc.

Eellike answered 27/4, 2011 at 21:42 Comment(1)
Thanks for this, everyone seems to suggest using a view, which is something I will need to learn. What would be the difference between this and simply writing concise SELECT statements with one big table? Am I "doing the right thing" by keeping the data in separate tables or does it not matter?Bursarial
M
1

There are two reasons to keep tables apart, both to do with how many records you keep about each user.

  • if each person has multiple profiles, keep user and profile data apart; use a column in the profile table (the many side of the relationship) to refer to the primary key of the user table.
  • if each person optionally have a profile (ie has one or none), use two tables in the same way, but to make joins easier, use the same primary key in both tables. The aim is to avoid tables with lots of empty rows. Another way to think of that is that profile inherits from person - and so uses a table of added data with the same key.

Bar those kinds of situations, you want to keep everything in one table, with one key. To express the multiple uses of the data, a good solution is to use views - select a subset of the data and keep it as a view, with a sensible name. When you want, say, administrative data, call up the corresponding view.

Macao answered 27/4, 2011 at 21:36 Comment(2)
There are no multiple profiles from same table. Is your other bullet point really a "reason to keep tables apart"? If I am able to use views as everyone suggests, then what would be the point of one big table when my data is already spread out into multiple tables?Bursarial
A table keeps together data that means something together. So each row in the user table is one user, and if the table is well designed, everything about each user is one row, with a clear id made of one or a few columns and other columns for extra info.Macao
T
1

If many fields are placed into a single table, the probability of multiple functions modifying a row of data increases, which will lead to greater lock contention and slower modification speed. The slower modification speed will also slow down query speed.

Telegenic answered 25/7 at 9:25 Comment(0)
C
-1

Unless you're having strange performance issues, you should just have one table.

By performance issues I'm talking about having so much data that you want to partition it across tables to keep it separate (physical disks, servers, whatever). This clearly isn't the case here. If it were the case then there are lots of better ways to deal with that kind of thing.

The sort of performance issues everyone wishes they had and not many people do...

Concordia answered 27/4, 2011 at 21:26 Comment(2)
I'm not so much "having performance issues" as I am trying to speed things up and make it easier to manage in the application. Unless you're having performance issues, you should just have one table. - So you're saying that if I am NOT having performance issues, I should use one table, but use several tables if I am having issues??? there are better ways to deal with it - Not sure what you are referring to here, can you enlighten me?Bursarial
I meant really weirdo performance issues rather than normals ones, I was over-protecting my answer. Apologies. I'll adjust accordingly.Concordia

© 2022 - 2024 — McMap. All rights reserved.