How can I use text analysis in order to investigate questionnaire responses?

Asked 9/12, 2012 at 10:26 Answered 10/12, 2012 at 13:59

python statistics computer-science lexical-analysis text-analysis

I'm the "programmer" of a team of pupils that aims to investigate satisfaction and general problems in my grammar school. We have a questionary that is built upon a scale from 1-6 and we interpret these answers by a diagram software that I wrote in python.

Now there's a <textarea> at the end of our questionary that one can use as he likes. I'm currently thinking of ways to make this data usable (we don't want to read more than 800+ answers).

How can I use text analysis in Python to investigate what pupils write? I was thinking of a way to "tag" any sentence that is written down, like:

I don't like being in school. [wellbeing][negative]
I have way too much homework. [homework][much]
I think there should be more interesting projects. [projects][more]

Are there any usable approaches to obtain that? Does it make sense to use an existing tokenizer?

Thanks for your help!

Selfsealing answered 9/12, 2012 at 10:26 Comment(2)

800 answers is not going to give you enough to do NLP with. You're better off reading the answers manually. You can train a naive Bayesian classifier with 600, and check it on 200, but that's only going to get you 2 dimensions-- and while it's possible to do this for more tags, you're going to need a lot more entries. – Etom 9/12, 2012 at 15:51

It helps if you tell us are the box comments completely freeform, or is there a set of known topics they will be on? (or use clustering to answer that, or even just plain old grep, or just eyeball it yourself) – Spirillum 13/7, 2016 at 22:19

well, I am just throwing in ideas here..but one approach I can think of is,

to use a clustering algorithm to cluster the responses first. something like K-means or you can do topic modelling using something like LDA.
Then you can use your tagging approach by doing text analysis to generate frequent/related keywords in each of the cluster/topic you get from step 1.

Why Step 1 would be a good idea? Well, in my opinion- while doing text analysis, if you arbitrarly go around tagging sentences, you could generate a lot of tags- a lot of them would be similar in context. Hence, your usability might go down that you still would have to analyze loads of tags for each sentence.

Using a clustering/topic modelling can help reduce the context problem to some level as well. Hence, more usable in my opinion.

Millisecond answered 9/12, 2012 at 10:40 Comment(0)

"NLTK Sentiment Analysis" is a good place to start searching. The Natural Language Toolkit is the package for doing text analysis in Python but it is not exactly simple because the task is quite complex. The first few results had some compelling demos but I didn't look at them in detail.

Dwaindwaine answered 9/12, 2012 at 12:35 Comment(1)

I think this could be the exact right thing for me! Thank you! – Selfsealing 9/12, 2012 at 14:41

I won't quite answer to your question. But if I understand you have a classic survey (with check boxes, ...) with a small text area question at the end...

So you will have about 800+ answers. But I guess the answers will not be too long. Usually it will a few lines or even a few words... I think that a manual QDA software will be better than an algorithms that won't be perfect. For instance you can use the open source RQDA (R project package) or commercials software such as Nvivio...

Thanks

Nahuatlan answered 10/12, 2012 at 13:59 Comment(0)

This sounds a lot like AI programming just because of the way that they 'tag' questions and responses. Maybe take a look at http://pyaiml.sourceforge.net/ and the artificial intelligence markup language. I don't have much experience with it, but you might be able to tweak it to your needs instead of doing it from scratch.

Stralsund answered 9/12, 2012 at 10:40 Comment(0)

Recommended topics

Hot tags