Continuous speech recognition while singing?
Asked Answered
J

1

9

As part of my application I'm looking to add speech recognition, but not really in the traditional sense. I have a bunch of lyrics (divided into verses) that are sung by someone, and the idea is to find what verse is currently being sung so it can be displayed on screen.

I've played around with sphinx and got some basic examples set up and working, but while there seems to be plenty of documentation around on registering spoken text where you can wait for a delay then process the result, I can't find much on the idea of recognising sentences continuously. This is of course before I get to the part where the words are being sung and not spoken!

Has anyone got any experience with this, and if so is there anywhere that would provide a good starting point? Or is what I'm trying to achieve way too ambitious with sphinx and is it never really going to work properly? I'm open to looking at other libraries but they must be free, and sphinx was the most widely talked about one I could dig up.

Josie answered 23/8, 2011 at 13:20 Comment(6)
I guess a large problem would be in getting a suitable training set, or boot-strapping from one of limited size.Seam
Could someone explain the reason for the downvote?Josie
Ggogle stumbles on this article when I use your question in a search. Not much help when looking for a specific library I'm afraid but it might be helpful to get you on track if you need to build something yourself.Forcible
Maybe you need to convert the existing sound data into something easily process-able. I am giving just a vague idea but something like hashing a section of the raw audio data (like a verse as you said) and then comparing with the recording. Once you catch up with the first verse, you should have a hint at whats coming next - should mostly be the following verse, you can start showing that and as soon as you get a small chunk of the next verse, just run a verification on that chunk. Hope this helps.Musaceous
Hey, I'm very interested in the results of your investigation and would really appreciate if you could tell me here or drop me a line over email to let me know if anything worked out with this project. My email is in the profile. Thanks!Capsaicin
@Capsaicin Afraid I can't see your email address in your profile, is it set to private? Mine is berry120 AT gmail DOT com if you want to discuss further then I'm more than happy to, but due to the lack of existing work on this area (and time from my end) I pretty much stopped looking at it. I'd still like to revisit it later though, and happy to throw suggestions back and forth on a few bits if you think that'd be helpful to us both. Entirely up to you :-)Josie
S
3

It's perfectly possible to recognize speech as soon as it's pronounced with a little delay. Moreover if you more or less understand what do you expect to get. This is called "partial result" and is available in all CMUSphinx decoders through API. Basically you can retrieve hypothesis in process.

There is a little issue to consider on how to stabilize this result (how to extract the stable part of it) but this technique is called backtracking and could be easily implemented

For singing, given the music can be filtered out it's also doable.

Seacoast answered 14/9, 2011 at 10:22 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.