C++ API for "Text To speech" and "Voice to Text"
Asked Answered
D

4

7

I would like to know whether there is a good API for "voice recognition" and "text to speech" in C++. I have gone through Festival, which you can't even say whether the computer is talking because it is so real and voce as well.

Unfortunately Festival seems not supporting to voice recognition (I mean "Voice to Text") and voce is built in Java and it is a mess in C++ because of JNI.

The API should support both "Text to voice" and "Voice to Text", and it should have a good set of examples, at least outside the owner's website. Perfect if it has a facility to identify set of given voices, but that is optional, so no worries.

What I am going to do with the API is, when set of voice commands given, turn the robot device left, right, etc. And also, speak to me saying "Good Morning", "Good Night" etc. These words will be coded in the program.

Please help me to find a good C++ voice API for this purpose. If you have access to a tutorial/installation tutorial, please be kind enough to share it with me as well.

Diplostemonous answered 30/4, 2013 at 9:24 Comment(1)
The microsoft's api is msdn.microsoft.com/en-us/library/ms720151(v=vs.85).aspxPhotovoltaic
A
1

if you develop on Windows you can use MS Speech API which allow you to perform Voice Recognition (ASR) and Text-to-Speech (TTS).
You can find some examples on this page and a very basic example of Voice Recognition in this post.

Atwell answered 30/4, 2013 at 14:48 Comment(0)
G
5

I found that If I make a audio recording (I used qtmultimedia for this) has to be flac Read more here

I can then upload to google and then have it send me back some JSON
I then wrote some c++/qt for this to make into a qml plugin Here is that (alpha) code. Note make sure that you replace
< YOUR FLAC FILE.flac > with your real flac file.

speechrecognition.cpp

#include <QNetworkReply>
#include <QNetworkRequest>
#include <QSslSocket>
#include <QUrl>
#include <QJsonDocument>
#include <QJsonArray>
#include <QJsonObject>
#include "speechrecognition.h"
#include <QFile>
#include <QDebug>
const char* SpeechRecognition::kContentType = "audio/x-flac; rate=8000";
const char* SpeechRecognition::kUrl = "http://www.google.com/speech-api/v1/recognize?xjerr=1&client=directions&lang=en";

SpeechRecognition::SpeechRecognition(QObject* parent)
  : QObject(parent)
{
    network_ = new QNetworkAccessManager(this);
    connect(network_, SIGNAL(finished(QNetworkReply*)),
            this, SLOT(replyFinished(QNetworkReply*)));
}

void SpeechRecognition::start(){
    const QUrl url(kUrl);
    QNetworkRequest req(url);
    req.setHeader(QNetworkRequest::ContentTypeHeader, kContentType);
    req.setAttribute(QNetworkRequest::DoNotBufferUploadDataAttribute, false);
    req.setAttribute(QNetworkRequest::CacheLoadControlAttribute,
                     QNetworkRequest::AlwaysNetwork);
    QFile *compressedFile = new QFile("<YOUR FLAC FILE.flac>");
    compressedFile->open(QIODevice::ReadOnly);
    reply_ = network_->post(req, compressedFile);
}

void SpeechRecognition::replyFinished(QNetworkReply* reply) {

  Result result = Result_ErrorNetwork;
  Hypotheses hypotheses;

  if (reply->error() != QNetworkReply::NoError) {
    qDebug() << "ERROR \n" << reply->errorString();
  } else {
      qDebug() << "Running ParserResponse for \n" << reply << result;
      ParseResponse(reply, &result, &hypotheses);
  }
  emit Finished(result, hypotheses);
  reply_->deleteLater();
  reply_ = NULL;
}

void SpeechRecognition::ParseResponse(QIODevice* reply, Result* result,
                                      Hypotheses* hypotheses)
{
 QString getReplay ;
 getReplay = reply->readAll();
 qDebug() << "The Replay " << getReplay;
 QJsonDocument jsonDoc = QJsonDocument::fromJson(getReplay.toUtf8());
  QVariantMap data = jsonDoc.toVariant().toMap();

  const int status = data.value("status", Result_ErrorNetwork).toInt();
  *result = static_cast<Result>(status);

  if (status != Result_Success)
    return;

  QVariantList list = data.value("hypotheses", QVariantList()).toList();
  foreach (const QVariant& variant, list) {
    QVariantMap map = variant.toMap();

    if (!map.contains("utterance") || !map.contains("confidence"))
      continue;

    Hypothesis hypothesis;
    hypothesis.utterance = map.value("utterance", QString()).toString();
    hypothesis.confidence = map.value("confidence", 0.0).toReal();
    *hypotheses << hypothesis;
    qDebug() << "confidence = " << hypothesis.confidence << "\n Your Results = "<< hypothesis.utterance;
    setResults(hypothesis.utterance);
}
}

  void SpeechRecognition::setResults(const QString &results)
{
    if(m_results == results)
    return;
        m_results = results;
    emit resultsChanged();
}

QString SpeechRecognition::results()const
{
    return m_results;
}

speechrecognition.h

#ifndef SPEECHRECOGNITION_H
#define SPEECHRECOGNITION_H

#include <QObject>
#include <QList>

class QIODevice;
class QNetworkAccessManager;
class QNetworkReply;
class SpeechRecognition : public QObject {
  Q_OBJECT
    Q_PROPERTY(QString results READ results NOTIFY resultsChanged)

public:
  SpeechRecognition( QObject* parent = 0);
  static const char* kUrl;
  static const char* kContentType;

  struct Hypothesis {
    QString utterance;
    qreal confidence;
  };
  typedef QList<Hypothesis> Hypotheses;

  // This enumeration follows the values described here:
  // http://www.w3.org/2005/Incubator/htmlspeech/2010/10/google-api-draft.html#speech-input-error
  enum Result {
    Result_Success = 0,
    Result_ErrorAborted,
    Result_ErrorAudio,
    Result_ErrorNetwork,
    Result_NoSpeech,
    Result_NoMatch,
    Result_BadGrammar
  };
  Q_INVOKABLE void start();
  void Cancel();
  QString results()const;
  void setResults(const QString &results);

signals:
  void Finished(Result result, const Hypotheses& hypotheses);
  void resultsChanged();

private slots:
  void replyFinished(QNetworkReply* reply);

private:
  void ParseResponse(QIODevice* reply, Result* result, Hypotheses* hypotheses);

private:
  QNetworkAccessManager* network_;
  QNetworkReply* reply_;
  QByteArray buffered_raw_data_;
  int num_samples_recorded_;
    QString m_results;
};

#endif // SPEECHRECOGNITION_H
Glomma answered 6/3, 2014 at 23:0 Comment(0)
A
1

if you develop on Windows you can use MS Speech API which allow you to perform Voice Recognition (ASR) and Text-to-Speech (TTS).
You can find some examples on this page and a very basic example of Voice Recognition in this post.

Atwell answered 30/4, 2013 at 14:48 Comment(0)
S
1

For the Voice Recognition part see Georgi Gerganov git project using OpenAI Whisper. It works offline (on Apple, Linux, Windows), loads the model from the files, it can convert audio files to text or it can convert in real-time audio from the microphone. I quote from readme.md:

"High-performance inference of OpenAI's Whisper automatic speech recognition (ASR) model:

  • Plain C/C++ implementation without dependencies
  • Apple silicon first-class citizen - optimized via ARM NEON, Accelerate framework and Core ML
  • AVX intrinsics support for x86 architectures
  • ..."

For the Text To Speech part see Ahmad Anis page for Text to Speech in C++:

"Text to speech is a common implementation of Machine Learning and indeed a lot of great machine learning applications have been built which uses text to speech. It is a lot easier to do text to speech in C++ just by importing some predefined models and use them."

It works on Windows and Linux(via wine). I quote from the git page containing the code:

" It works with Microsoft Sapi and gives you option to output speech in Normal , 2x , -2x "

That means it uses Microsoft Speech API (SAPI) 5.3.

Sakovich answered 9/5, 2023 at 17:39 Comment(0)
P
0

You could theoretically use Twilio if you have an internet connection in the robot and are willing to pay for the service. They have libraries and examples for a bunch of different languages and platforms http://www.twilio.com/docs/libraries

Also, check out this blog explaining how to build and control an arduino based robot using Twilio http://www.twilio.com/blog/2012/06/build-a-phone-controlled-robot-using-node-js-arduino-rn-xv-wifly-arduinoand-twilio.html

Pictogram answered 2/5, 2013 at 10:14 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.