Database design for a survey [closed]
Asked Answered
U

10

149

I need to create a survey where answers are stored in a database. I'm just wondering what would be the best way to implement this in the database, specifically the tables required. The survey contains different types of questions. For example: text fields for comments, multiple choice questions, and possibly questions that could contain more than one answer (i.e. check all that apply).

I've come up with two possible solutions:

  1. Create a giant table which contains the answers for each survey submission. Each column would correspond to an answer from the survey. i.e. SurveyID, Answer1, Answer2, Answer3

    I don't think this is the best way since there are a lot of questions in this survey and doesn't seem very flexible if the survey is to change.

  2. The other thing I thought of was creating a Question table and Answer table. The question table would contain all the questions for the survey. The answer table would contain individual answers from the survey, each row linked to a question.

    A simple example:

    tblSurvey: SurveyID

    tblQuestion: QuestionID, SurveyID, QuestionType, Question

    tblAnswer: AnswerID, UserID, QuestionID, Answer

    tblUser: UserID, UserName

    My problem with this is that there could be tons of answers which would make the Answer table pretty huge. I'm not sure that's so great when it comes to performance.

I'd appreciate any ideas and suggestions.

Unyielding answered 19/11, 2009 at 16:7 Comment(2)
How much is "pretty huge"? Give us an estimation, are we talking about a million or a thousand million?Scenarist
SQL servers are actually designed to work with 'tons' of data. You shouldn't have much trouble working with the scheme that you've talked about.Snowy
M
142

I think that your model #2 is fine, however you can take a look at the more complex model which stores questions and pre-made answers (offered answers) and allows them to be re-used in different surveys.

- One survey can have many questions; one question can be (re)used in many surveys.
- One (pre-made) answer can be offered for many questions. One question can have many answers offered. A question can have different answers offered in different surveys. An answer can be offered to different questions in different surveys. There is a default "Other" answer, if a person chooses other, her answer is recorded into Answer.OtherText.
- One person can participate in many surveys, one person can answer specific question in a survey only once.

survey_model_02

Morpheme answered 19/11, 2009 at 17:4 Comment(8)
what tool did you use to make the database schema?Evered
I use Altova UModel. It's quick, offers a wide selection of modeling structures, and saves to pretty much every format. Though, it costs.Crystlecs
You can also use draw.io It's free w/ no signup and easy to use.Hickie
Why do we have Survey_Question_Answer and Answer? Isn't just Answer enough?Hypergolic
Abubakar Ahmad, Survey_Question_Answer is for multiple premade answers to pick from user.. for one question..Danziger
I think Answer is enough, Survery_question_answer is redundantGreenhead
In many cases to save personal information about participants is not allowed. I guess you do not need the Person table.Descender
I think so. If you add type column in Survey_Question_Answer/Answer table. Then you can have as many types as you want. As mentioned above, answer table enough.Glass
H
22

Definitely option #2, also I think you might have an oversight in the current schema, you might want another table:

+-----------+
| tblSurvey |
|-----------|
| SurveyId  |
+-----------+

+--------------+
| tblQuestion  |
|--------------|
| QuestionID   |
| SurveyID     |
| QuestionType |
| Question     |
+--------------+

+--------------+
| tblAnswer    |
|--------------|
| AnswerID     |
| QuestionID   |
| Answer       |
+--------------+

+------------------+
| tblUsersAnswer   |
|------------------|
| UserAnswerID     |
| AnswerID         |
| UserID           |
| Response         |
+------------------+

+-----------+
| tblUser   |
|-----------|
| UserID    |
| UserName  |
+-----------+

Each question is going to probably have a set number of answers which the user can select from, then the actual responses are going to be tracked in another table.

Databases are designed to store a lot of data, and most scale very well. There is no real need to user a lesser normal form simply to save on space anymore.

Hydropathy answered 19/11, 2009 at 16:25 Comment(1)
Hi, I have a question. Shouldn't SurveyId be present in the answer table as well or atleast a time stamp matching the versioning time of the survey? If you inserted a question in your original survey, the questionIds would change, and the answers would become unidentifiable. Or if it is redundant, could you explain how?Manno
F
3

As a general rule, modifying schema based on something that a user could change (such as adding a question to a survey) should be considered fairly smelly. There's cases where it can be appropriate, particularly when dealing with large amounts of data, but know what you're getting into before you dive in. Having just a "responses" table for each survey means that adding or removing questions is potentially very costly, and it's very difficult to do analytics in a question-agnostic way.

I think your second approach is best, but if you're certain you're going to have a lot of scale concerns, one thing that has worked for me in the past is a hybrid approach:

  1. Create detailed response tables to store per-question responses as you've described in 2. This data would generally not be directly queried from your application, but would be used for generating summary data for reporting tables. You'd probably also want to implement some form of archiving or expunging for this data.
  2. Also create the responses table from 1 if necessary. This can be used whenever users want to see a simple table for results.
  3. For any analytics that need to be done for reporting purposes, schedule jobs to create additional summary data based on the data from 1.

This is absolutely a lot more work to implement, so I really wouldn't advise this unless you know for certain that this table is going to run into massive scale concerns.

Fabozzi answered 19/11, 2009 at 16:16 Comment(0)
A
1

No 2 looks fine.

For a table with only 4 columns it shouldn't be a problem, even with a good few million rows. Of course this can depend on what database you are using. If its something like SQL Server then it would be no problem.

You'd probably want to create an index on the QuestionID field, on the tblAnswer table.

Of course, you need to specify what Database you are using as well as estimated volumes.

Absa answered 19/11, 2009 at 16:11 Comment(0)
H
1

The second approach is best.

If you want to normalize it further you could create a table for question types

The simple things to do are:

  • Place the database and log on their own disk, not all on C as default
  • Create the database as large as needed so you do not have pauses while the database grows

We have had log tables in SQL Server Table with 10's of millions rows.

Hognut answered 19/11, 2009 at 16:25 Comment(0)
V
1

You may choose to store the whole form as a JSON string.

Not sure about your requirement, but this approach would work in some circumstances.

Vibrio answered 17/4, 2017 at 19:47 Comment(0)
K
0

Looks pretty complete for a smiple survey. Don't forget to add a table for 'open values', where a customer can provide his opinion via a textbox. Link that table with a foreign key to your answer and place indexes on all your relational columns for performance.

Ketubim answered 19/11, 2009 at 16:10 Comment(1)
Is there a reason why I couldn't also put the comments within the answer table?Unyielding
V
0

Number 2 is correct. Use the correct design until and unless you detect a performance problem. Most RDBMS will not have a problem with a narrow but very long table.

Valonia answered 19/11, 2009 at 16:10 Comment(0)
B
0

Having a large Answer table, in and of itself, is not a problem. As long as the indexes and constraints are well defined you should be fine. Your second schema looks good to me.

Basra answered 19/11, 2009 at 16:11 Comment(0)
Y
0

Given the proper index your second solution is normalized and good for a traditional relational database system.

I don't know how huge is huge but it should hold without problem a couple million answers.

Yeoman answered 19/11, 2009 at 16:11 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.