How many columns is too many columns? [closed]
Asked Answered
O

9

75

I've noticed that a lot of folks here cite tables with 20+ (I've seen as much as 55) columns in one table. Now I don't pretend to be a database design expert, but I've always heard that this is a horrible practice. When I see this, I usually suggest splitting into two tables with a one to one relationship: one containing the most frequently used data, the other with the least often used data. Though at the same time, there's the possible issue of performance (less JOINs and such). So my question is this:

When it comes to really LARGE scale databases, is there actually an advantage to having a large amount of columns, despite the fact that this usually leads to many NULL values?

Which is more of a performance hit: lots of columns with lots of NULLs, or fewer columns with lots of JOINs?

Outing answered 6/7, 2010 at 8:0 Comment(3)
Seems pretty obvious that it depends entirely on the requirements of the database and how heavy each respective operation is performed on it. Thanks for the answers.Outing
I thought the process was design the database and tables to 3rd or 4th normal form, and then denormalize for performance. Doesn't the process drive the column count, and not developer feelings?Trek
50 Columns not a lot, but in my opinion when tables have too many columns upward of 300, 400, then you have to look at the whole application and who is designing it and see if there are patterns of duplication. You'll probably find some. It can be hard to break large tables apart in production applications so it's better to start with a good foundation.Merchandise
D
84

The design of the table depends on the entity it needs to store. If all the data belongs together, then 50 columns (or even 100) might be the correct thing to do.

So long as the table is normalized, there is no rule of thumb regarding size, apart from database capabilities and the need to optimize.

Delacruz answered 6/7, 2010 at 8:3 Comment(1)
I know this is old. Please forgive. So I have a table with 50 columns, for instance. And I have a model class in C# with all the columns as properties. I need an immutable object, meaning, among other things I have to pass all the 50 columns through the ctor. Any ideas on how best to handle this scenario?Shaeffer
F
15

I agree with Oded. I have seen tables with 500 columns in them, and all the columns in them were in the correct place. Just consider the number of facts one might wish to store about an everyday object, and you'll soon see why.

If it proves inconvenient to select all those columns, or to specify which columns to select when you are only interested in a small proportion of them, you may find it worthwhile to define a view.

Faustena answered 6/7, 2010 at 8:9 Comment(0)
D
9

How many columns is too many columns?

When you feel it no longer makes sense or is right to add another column.

Generally depends on application.

Davide answered 6/7, 2010 at 8:15 Comment(0)
A
8

Having too many columns results in a lot nulls (evil) and an unwieldy object the table is mapped to. This hurts readability in the IDE and hinders maintenance (increasing development costs). If you need fast reads in some cases use denormalized tables e.g. used solely for reporting or queries (search for the "CQRS" pattern). Yes "Person" has a million attributes, but you can break down these monothilic tables (design preceeds normalization) to match smaller entities ("address," "phone," "hobby") instead of adding new columns for each new use case. Having smaller sized objects (and tables) brings so many advantages; they enable things like unit testing, OOP, and SOLID practices.

Also, as it regards to bunching numerous columns to avoid joins, I think the performance gain from avoiding joins is lost through index maintenance, assuming a typical workload of both reads and writes. Adding indexes on fields for sake of read performance could be indicative of a need to move those fields into their own table.

Adachi answered 8/3, 2018 at 21:25 Comment(0)
K
4

odbc has a character limit of 8000 .... so that is a physical limit beyond which things get highly frustrating.

I worked on a table that had 138 columns .. it was horribly written and could have been normalised. Although this database seem to of been the creation of someone wondering why there are conventions in database design and deciding to test them all at once.

Having very wide flattened tables is fairly common when you get into data warehousing and reporting servers. They are just a lot faster and mean that you don't have to store your database entirley in ram for performance.

Kapok answered 6/7, 2010 at 8:13 Comment(0)
P
3

It also highly depends on the usecase for your table. If you want to optimize it for reading then it might be a good idea to keep it all together in one table.

In the NO-SQL world (cassandra/hbase for example) there are no constraints on the number of columns and it's actually considered a good practice to have many columns. This also comes from the way it is stored (no gaps). Worth while investigating.

Pictish answered 6/7, 2010 at 8:18 Comment(0)
O
2

According to my experience it is better to have less joins as those tend to happen too often especially in big database. As long as your database tables are designed to store single entity (student, teacher and so on) this should be ok. So that this will be represented as an object in you code later. So, if you split the entity to several tables you will have to use several joins in order to fill your object later. Also if you use ORM to generate your data access layer (such as Linq in .Net) is will generate separate classes for each table (of course with an relationship between them but still) and this will be harder to use.

Another thing is that you can specify which columns to return in your query and this will reduce the data that is passed to your application, but if you need even a single column from another table you will have to do the join. And in most cases as you have so many columns, then the probability to have large amount of data stored in the db is high. So this join would harm more, than the NULLs.

Every project I have worked on is different so you should find the balance for each story.

Oxytocic answered 6/7, 2010 at 8:15 Comment(1)
Very true. Obviously, joins and multiple select queries are slow so denormalization should be considered wherever possible without breaking consistency as you've suggested.Dative
K
1

Which is more of a performance hit: lots of columns with lots of NULLs, or fewer columns with lots of JOINs?

It is purely depends on data you store, indexes you make and so on. No one can ensure you that one works better than another without knowing what are you storing. Generally normalization rules will "force" you separate data to different tables and user FKeys if you have large table but i disagree that it ALWAYS performs better than one big table. You can end with 6-7 level joins in dozens of queries that sometimes will cause errors because there much more chances to create an error in larger queries that in simple ones.

If you post some requirements of what you are doing maybe we can help you with designing the DB properly.

Kratz answered 6/7, 2010 at 8:13 Comment(0)
A
-3

It's better to use a single table by where you can avoid using joins while querying it depends on whether the columns are of same entity or different entity.

For example, assume you are doing a database design for work flow where some fields will be edited by junior workers, and some fields by senior workers. In this case it is better to have all the columns in a single table.

Amatory answered 30/5, 2014 at 6:52 Comment(1)
-1: why is it better? In what way is it better?Respiration

© 2022 - 2024 — McMap. All rights reserved.