This is just a stab in the dark as I'm not a linguist (although, I have written a voice synthesizer), the metric that be useful here is the number of phonemes that make up each word, since the phonemes themselves are going to be the same approximate duration regardless of use. There's an International Phonetic Alphabet chart for english dialects, as well as a nice phonology of English.
A good open-source phonetic dictionary is available from the cmudict project which has about 130k words
Here's a really quick stab at a look up program:
#!/usr/bin/python
import re
words={}
for line in open("cmudict.0.7a",'ro').readlines():
split_idx = line.find(' ')
words[line[0:split_idx]] = line[split_idx+1:-1]
user_input = raw_input("Words: ")
print
for word in user_input.split(' '):
try:
print "%25s %s" % (word, words[word.upper()])
except:
print "%25s %s" % (word, 'unable to find phonems for word')
When run..
Words: I support hip hop from the underground up
I AY1
support S AH0 P AO1 R T
hip HH IH1 P
hop HH AA1 P
from F R AH1 M
the DH AH0
underground AH1 N D ER0 G R AW2 N D
up AH1 P
If you want to get super fancy pants about this, there's always the Python Natural Language Toolkit which may have some useful tidbits for you.
Additionally, some real world use.. although to be fair, I fixed 'stylin' to 'styling'.. But left 'tellin' to reveal the deficiency of unknown words.. You could probably try a lookup for words ending with in'
by subbing the g in for the apostrophe and then drop the NG
phoneme from the lookup..
Yes Y EH1 S
the DH AH0
rhythm R IH1 DH AH0 M
the DH AH0
rebel R EH1 B AH0 L
Without W IH0 TH AW1 T
a AH0
pause P AO1 Z
I'm AY1 M
lowering L OW1 ER0 IH0 NG
my M AY1
level L EH1 V AH0 L
The DH AH0
hard HH AA1 R D
rhymer R AY1 M ER0
where W EH1 R
you Y UW1
never N EH1 V ER0
been B IH1 N
I'm AY1 M
in IH0 N
You Y UW1
want W AA1 N T
styling S T AY1 L IH0 NG
you Y UW1
know N OW1
it's IH1 T S
time T AY1 M
again AH0 G EH1 N
D D IY1
the DH AH0
enemy EH1 N AH0 M IY0
tellin unable to find phonems for word
you Y UW1
to T UW1
hear HH IY1 R
it IH1 T
They DH EY1
praised P R EY1 Z D
etc...
If this is something you plan on putting some time into, I'd be interested in helping. I think putting 'Worlds first rapping IDE' on my resume would be hilarious. And if one exists already, world's first Python based rapping IDE. :p
ptkbdgw
and theth
. But I guess it's equally important how those are distributed over the sentence.. – Acridine