UCMA を使った音声合成・音声認識の開発 (Speech / Voice / TTS / ASR / IVR etc)

By Tsuyoshi Matsuzaki on 2011-08-26• ( 1 Comment )

環境 : Lync Server 2010, Lync 2010, UCMA 3.0 Runtime, UCMA 3.0 SDK, Microsoft Speech Platform – Server Runtime Languages (ja-JP) 10.2, Visual Studio 2010

UCMA 3.0 アプリケーション開発事始め

環境構築 / Back to Back Call
音声処理

こんにちは。

「Tech Party 2011 in 北陸」 (hokuriku.NET) のセッションにご参加いただいた皆様、遅刻の上、時間を超過してしまい、大変すみませんでした。デモで紹介した UCMA を使った音声処理について、説明がほとんどできなかったので、以下にソースコードを含めて紹介しておきます。(UCMA の環境構築については、かなり以前に記載した UCMA の上記の記事がありますので、その記事の連載として記載しておきます。)
ご説明したように、Enterprise Voice などを使って、これらの処理は、一般の電話からも使用することができます。

なお、いつものように、下記では、細かな例外処理などは省略して記載しますので、現実の開発では、こうした点に配慮してください。(特に、セッションでもご説明したように、複数ユーザーで利用した場合にも配慮し、現実の開発では、Conversation や McuSession ごとにインスタンスを作成して処理すると良いでしょう。)

前回の投稿で記載した通り、必要な環境の設定をおこなって、あらかじめ、New-CsTrustedApplication、Enable-CsTopology、New-CsTrustedApplicationEndpoint の各コマンドで、今回の UCMA アプリケーションとそのエンドポイントを登録しておいてください。(今回は、アプリケーション名を「NotificationSample」、アプリケーション ID を「urn:application:notificationsample」、エンドポイントの SIP アドレスを「NotificationSampleEndpoint@example.jp」とします。)
また、Visual Studio を管理者権限で実行し、必要な参照追加などをおこなっておきましょう。(ここでは、これらの必要な準備については省略します。詳細は、前回の投稿を参照してください。)

Play / Record

まず、もっとも基本的な音声処理として、Player クラス、Recorder クラスを使った音の再生や録音を説明します。

さっそく、簡単な通知 (Notification) のサンプルコードを見てみましょう。以下のコードを記述してみてください。(なお、下記で、太字以外の部分は、前回の投稿と同じですので、特に説明の必要はないでしょう。太字部分のみに注目してください。)
このアプリケーションエンドポイントは、1 分ごとに、demouser1 に対して AudioVideoCall を飛ばします。(しつこい勧誘のように、1 分ごとに、無限に、電話しつづけます。) demouser1 との音声通話が開始 (接続) されると、wma ファイルの再生をおこない、再生内容を Attach メソッドを使ってこの会話 (AudioVideoCall) に流します。(つまり、demouser1 がこの会話を受信すると、音声が流れてくるという簡単なサンプルです。)

. . .using System.Threading;using Microsoft.Rtc.Collaboration;using Microsoft.Rtc.Collaboration.AudioVideo;. . .class Program{static void Main(string[] args){Program prog = new Program();prog.Run();}private CollaborationPlatform rtcPlatform;private ApplicationEndpoint appEndpoint;private Conversation currentConversation;private AudioVideoCall currentCall;private Player myAudioPlayer;private WmaFileSource soundSrc;private Timer timer;public void Run(){// Start platform and establish endpointProvisionedApplicationPlatformSettings settings =new ProvisionedApplicationPlatformSettings("NotificationSample", "urn:application:notificationsample");rtcPlatform = new CollaborationPlatform(settings);rtcPlatform.RegisterForApplicationEndpointSettings(this.ApplicationEndpointSettingsDiscovered);rtcPlatform.BeginStartup(this.PlatformStartupCompleted, null);// Terminate endpoint and shutdown platformConsole.WriteLine("Press enter to shutdown ...");Console.ReadLine();appEndpoint.BeginTerminate(this.EndpointTerminateCompleted, null);}public void ApplicationEndpointSettingsDiscovered(object sender,ApplicationEndpointSettingsDiscoveredEventArgs e){Console.WriteLine("Called applicationEndpointSettingsDiscovered ...");ApplicationEndpointSettings settings = e.ApplicationEndpointSettings;settings.UseRegistration = true;settings.Presence.Description = "This is Notification sample.";appEndpoint = new ApplicationEndpoint(rtcPlatform, settings);appEndpoint.BeginEstablish(this.EndpointEstablishCompleted, null);}public void PlatformStartupCompleted(IAsyncResult res){rtcPlatform.EndStartup(res);Console.WriteLine("Completed platform startup ...");}public void EndpointEstablishCompleted(IAsyncResult res){appEndpoint.EndEstablish(res);Console.WriteLine("Completed endpoint establish ...");// Setup AudiomyAudioPlayer = new Player();myAudioPlayer.SetMode(PlayerMode.Manual);soundSrc = new WmaFileSource(@"C:\Demo\test.wma");myAudioPlayer.SetSource(soundSrc);soundSrc.BeginPrepareSource(MediaSourceOpenMode.Buffered,PrepareSourceCompleted,null);// Set Notification TimerTimerCallback callback = new TimerCallback(MyNotificationCallback);timer = new Timer(callback, null, 0, 60000);}public void EndpointTerminateCompleted(IAsyncResult res){appEndpoint.EndTerminate(res);Console.WriteLine("Completed endpoint terminate ...");rtcPlatform.BeginShutdown(this.PlatformShutdownCompleted, null);}public void PlatformShutdownCompleted(IAsyncResult res){rtcPlatform.EndShutdown(res);Console.WriteLine("Completed platform shutdown ...");}public void PrepareSourceCompleted(IAsyncResult res){soundSrc.EndPrepareSource(res);Console.WriteLine("Completed sound source prepare ...");}public void MyNotificationCallback(object o){Console.WriteLine("Notification fired !");// create AudioVideo conversation !currentConversation = new Conversation(appEndpoint);currentCall = new AudioVideoCall(currentConversation);currentCall.StateChanged +=new EventHandler<CallStateChangedEventArgs>(currentCall_StateChanged);currentCall.BeginEstablish(@"sip:demouser1@example.jp",new CallEstablishOptions(),this.ConversationEstablishCompleted,null);}public void ConversationEstablishCompleted(IAsyncResult res){currentCall.EndEstablish(res);Console.WriteLine("Conversation Established !");// Attach music to current audio call and start music !myAudioPlayer.AttachFlow(currentCall.Flow);myAudioPlayer.Start();}public void currentCall_StateChanged(object sender, CallStateChangedEventArgs e){if (e.State == CallState.Terminated){myAudioPlayer.Stop();myAudioPlayer.DetachFlow(currentCall.Flow);}}}

上記は通知 (Notification) のサンプルですが、この Player を使って、セミナーで紹介したように、保留 (Hold) の際に音楽を流すこともできます。(下記)

. . .// Set HoldcurrentCall.Flow.BeginHold(HoldType.RemoteEndpointMusicOnHold,AudioVideoFlowHoldCompleted, null);// Attach music to current audio call and start music !myAudioPlayer.AttachFlow(currentCall.Flow);myAudioPlayer.Start();}private void AudioVideoFlowHoldCompleted(IAsyncResult result){currentCall.Flow.EndHold(result);// Some kind of process . . .Thread.Sleep(10000);// Release HoldcurrentCall.Flow.BeginRetrieve(AudioVideoFlowRetrieveCompleted, null);}private void AudioVideoFlowRetrieveCompleted(IAsyncResult result){currentCall.Flow.EndRetrieve(result);// stop musicmyAudioPlayer.Stop();myAudioPlayer.DetachFlow(currentCall.Flow);// transfer callcurrentCall.BeginTransfer(@"sip:demouser1@example.jp",AudioVideoCallTransferCompleted, null);}

また、ここでは、音声を流したままにしていますが、音声の再生完了のイベントによって別の処理をおこなうには、下記のように、Player インスタンスの StateChanged イベントを使用します。

myAudioPlayer.StateChanged += this.OnPlayerStateChanged;. . .public void OnPlayerStateChanged(object sender, PlayerStateChangedEventArgs args){  if (args.State == PlayerState.Stopped &&args.TransitionReason == PlayerStateTransitionReason.PlayCompleted)  {. . .  }}

特に、保留音などのように、音楽を繰り返す場合には、以下の通りになります。

public void OnPlayerStateChanged(object sender, PlayerStateChangedEventArgs args){  if (args.State == PlayerState.Stopped &&args.TransitionReason == PlayerStateTransitionReason.PlayCompleted)((Player)sender).Start();}

なお、現実の開発では、相手がオフラインの場合もあるので、AsyncResult の評価など (エラー、Exception の確認など) もおこなうようにしてください。

音声とテキストの変換処理 (ASR / TTS)

デモでご紹介したように、SpeechSynthesizer クラスや SpeechRecognitionEngine クラスを使用して、テキストによる音声合成 (Text to Speech, TTS) や音声認識 (Automatic Speech Recognization, ASR) を扱うこともできます。
今回は、音声合成 (SpeechSynthesizer) のサンプルを見てみましょう。

まず、ASR / TTS を日本語で使用するために、下記の日本語用の Speech Recognition Engine と Text-to-Speech Engine の双方をダウンロードしてインストールしておきましょう。また、作成した Visual Studio のプロジェクトに、%programfiles%\Microsoft Speech Platform SDK\Assembly\Microsoft.Speech.dll を参照追加しておきます。

[ダウンロードセンター] Microsoft Speech Platform – Server Runtime Languages (Version 10.2)

http://www.microsoft.com/download/en/details.aspx?id=21924

補足 : Speech のコアエンジンそのものは、UCMA 環境のインストール (前回の投稿を参照) と共にインストールされます。(ただし、インストールされているエンジンのバージョンにご注意ください。Lync Server にインストールされるバージョンと、UCMA サーバーにインストールされるバージョンは異なっている場合があります。)

今度は、上記の通知のサンプルコードを以下の通り変更します。(変更したのは、太字の部分だけです。)
ダウンロードセンター (上記) にある Haruka さんの日本語音声で、「これは、テストです」という音声が 1 分おきに通知されます。

. . .using Microsoft.Speech.Synthesis;using Microsoft.Speech.AudioFormat;. . .static void Main(string[] args){Program prog = new Program();prog.Run();}private CollaborationPlatform rtcPlatform;private ApplicationEndpoint appEndpoint;private Conversation currentConversation;private AudioVideoCall currentCall;private Timer timer;. . .public void Run(){// Start platform and establish endpointProvisionedApplicationPlatformSettings settings =new ProvisionedApplicationPlatformSettings("NotificationSample", "urn:application:notificationsample");rtcPlatform = new CollaborationPlatform(settings);rtcPlatform.RegisterForApplicationEndpointSettings(this.ApplicationEndpointSettingsDiscovered);rtcPlatform.BeginStartup(this.PlatformStartupCompleted, null);// Terminate endpoint and shutdown platformConsole.WriteLine("Press enter to shutdown ...");Console.ReadLine();appEndpoint.BeginTerminate(this.EndpointTerminateCompleted, null);}public void ApplicationEndpointSettingsDiscovered(object sender,ApplicationEndpointSettingsDiscoveredEventArgs e){Console.WriteLine("Called applicationEndpointSettingsDiscovered ...");ApplicationEndpointSettings settings = e.ApplicationEndpointSettings;settings.UseRegistration = true;settings.Presence.Description = "This is Notification sample.";appEndpoint = new ApplicationEndpoint(rtcPlatform, settings);appEndpoint.BeginEstablish(this.EndpointEstablishCompleted, null);}public void PlatformStartupCompleted(IAsyncResult res){rtcPlatform.EndStartup(res);Console.WriteLine("Completed platform startup ...");}public void EndpointEstablishCompleted(IAsyncResult res){appEndpoint.EndEstablish(res);Console.WriteLine("Completed endpoint establish ...");// Set Notification TimerTimerCallback callback = new TimerCallback(MyNotificationCallback);timer = new Timer(callback, null, 0, 60000);}public void EndpointTerminateCompleted(IAsyncResult res){appEndpoint.EndTerminate(res);Console.WriteLine("Completed endpoint terminate ...");rtcPlatform.BeginShutdown(this.PlatformShutdownCompleted, null);}public void PlatformShutdownCompleted(IAsyncResult res){rtcPlatform.EndShutdown(res);Console.WriteLine("Completed platform shutdown ...");}public void MyNotificationCallback(object o){Console.WriteLine("Notification fired !");// create AudioVideo conversation !currentConversation = new Conversation(appEndpoint);currentCall = new AudioVideoCall(currentConversation);currentCall.BeginEstablish(@"sip:demouser1@example.jp",new CallEstablishOptions(),this.ConversationEstablishCompleted,null);}public void ConversationEstablishCompleted(IAsyncResult res){currentCall.EndEstablish(res);Console.WriteLine("Conversation Established !");// Wait AudioVideo Flow activatedwhile (currentCall.Flow.State != MediaFlowState.Active)Thread.Sleep(1000);// Create a speech synthesis connector and attach it to an AudioVideoFlowSpeechSynthesisConnector speechSynthesisConnector = new SpeechSynthesisConnector();speechSynthesisConnector.AttachFlow(currentCall.Flow);// Create a speech synthesis and set connector to itSpeechSynthesizer speechSynthesis = new SpeechSynthesizer();SpeechAudioFormatInfo audioformat = new SpeechAudioFormatInfo(16000,AudioBitsPerSample.Sixteen,Microsoft.Speech.AudioFormat.AudioChannel.Mono);speechSynthesis.SetOutputToAudioStream(speechSynthesisConnector, audioformat);// Start connectorspeechSynthesisConnector.Start();// Start streaming from speech synthesisspeechSynthesis.Speak(new Prompt("これは、テストです。"));// Detach speech synthesis connector from flowspeechSynthesisConnector.DetachFlow();}. . .

補足 : 使用する言語エンジン (英語、日本語、など) を切り替える場合は、通常、SelectVoice メソッドを使用しますが、内容によって自動で認識するようですので、上記では省略しています。

補足 : 上記の通り、SpeechSynthesisConnector は、Flow の状態 (State) が「Activated」になっている必要があるため、while ループを使って 1 秒ごとに確認しています。通常、こうした処理では、AutoResetEvent / ManualResetEvent を使用するのが一般的ですが、Flow の (State の) Activated イベントが到着するタイミングなどが一様ではないので、実際のプログラミングの際は注意してください。

補足 : 以前の投稿で記載した通り、Lync 2010 では、26 言語に対応しています。

ここでは SpeechSynthesizer を使った音声合成のサンプルを示しましたが、音声認識 (Speech Recognition) のサンプルについては、UCMA SDK に付属の %programefiles%\Microsoft UCMA 3.0SDK\Core\Sample Applications\QuickStarts\AudioVideoCall\SpeechRecognitionConnector に簡単なサンプルが入っていますので、確認してみてください。

IVR (Interactive Voice Response )

つぎに、音声ガイダンスのような Interactive Voice Response (IVR) のアプリケーションを構築してみましょう。

この場合、上記の SpeechSynthesizer、 SpeechRecognitionEngine を使用して 1 から組み立てていっても良いですが、大変面倒ですね。UCMA (Lync エンドポイント開発) では、こうした IVR アプリケーションを簡単に定義できるように、VoiceXML をサポートしています。

補足 : なお、プッシュボタンによる振り分けなど、簡単な Interactive Voice Response (IVR) のワークフローであれば、コードを記述する必要はありません。Lync Server Control Panel を使用して、Response Group の設定で構築できます。

アプリケーションの構築の流れは、以下の通りです。

対話の内容を VoiceXML (.vxml ファイル) で組み立てる
Microsoft.Rtc.Collaboration.AudioVideo.VoiceXml.Browser の Run メソッド (または、RunAync メソッド) で .vxml ファイルを読み込む
対話の結果として、ユーザーが選択した内容を認識して、つぎの処理をおこなう

先ほど (上記) は通知 (Notification) のサンプルでしたが、今回は、IVR らしく、AudioVideoCall を受信 (Receive) して処理するアプリケーションサンプルを構築してみましょう。

なお、ここでも、あらかじめ、ASR、TTS の日本語エンジンをダウンロードセンター (上記) より取得して、UCMA Server にインストールしておいてください。(内部で使用しています。)

まず、下記の .vxml ファイルを作成します。(今回、このファイルを C:\Demo\test.vxml として配置します。) 注意点として、特に日本語をお使いの方は、メモ帳などで、このファイルを UTF-8 の文字コードで保存するようにしてください。
下記のコード (XML) を読めば何となく想像できると思いますが (Tech Party にご参加された方は、デモで見せたソースとまったく同じです . . .)、どのような処理をしているか簡単に説明しておきましょう。この VoiceXML では、最初に、「お問い合わせの内容を、技術、セールス、ライセンス、その他の中から選択してください」と質問し、ユーザーが選んだカテゴリーに応じて、「ライセンスが選択されました」などと応答します。また、10 秒間経って、ユーザーが何も選択しなかった場合には、「音声を聞き取ることができませんでした」と応答し、技術、セールス、ライセンス、その他以外のカテゴリーを答えた場合は、「問い合わせの種別を認識できません」と応答します。

補足 : Tech Party にご参加された方はお聞きになった通り、Haruka さんは、しゃべる内容によって、若干、なまります… あと、英語も話せますが、発音が苦手みたいですので、空港の音声案内のように、日本語と英語と両方をしゃべるような処理ではなく、最初に、「日本語をご希望の方は 1 を、英語をご希望の方は 2 を . . .」などの音声案内を多言語でおこなってから、VoiceXML を分けると良いでしょう。(英語の会話は、VoiceXML の定義を英語に変更して、英語用の Helen さんのエンジンなどで処理させるようにしましょう。) なお、ファイルを指定せず、ストリームで読み込んだ VoiceXML の内容も処理できるので、動的に VoiceXML の内容を変更して使用することもできます。

補足 : VoiceXML (VXML) の詳細については、こちらを参照してください。

<?xml version="1.0" encoding="Shift_JIS" ?><vxml version="2.0"  xmlns="http://www.w3.org/2001/vxml"  xml:lang="ja-JP"  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"  xsi:schemaLocation="http://www.w3.org/2001/vxml http://www.w3.org/TR/voicexml21/vxml.xsd" >  <form><field name="QuestionCategory">  <prompt bargein="true" bargeintype="speech" timeout="10s">  お問い合わせの内容を、技術、セールス、ライセンス、その他 の中から選択してください  </prompt><grammar mode="voice" type="application/srgs+xml" root="QuestionCategory"><rule id="QuestionCategory" scope="public"><one-of><item>技術</item><item>セールス</item><item>ライセンス</item><item>その他</item></one-of></rule></grammar><nomatch><prompt bargein="true" bargeintype="speech" timeout="3s">問い合わせの種別を認識できません</prompt><exit/>  </nomatch>  <noinput>  <prompt bargein="true" bargeintype="speech" timeout="3s">  音声を聞き取ることができませんでした  </prompt><exit/>  </noinput>  <filled>  <prompt>  <value expr="QuestionCategory$.utterance"/>が選択されました  </prompt><exit namelist="QuestionCategory QuestionCategory$.utterance"/>   </filled> </field>  </form></vxml>

VoiceXML の作成が完了したので、つぎに、Visual Studio で UCMA のアプリケーションを構築します。
今回、Visual Studio のプロジェクトに、以下を参照追加しておいてください。

%programfiles%\Microsoft Speech Platform SDK\Assembly\Microsoft.Speech.dll
%programfiles%\Microsoft Speech Platform SDK\Assembly\Microsoft.Speech.VoiceXml.dll
%programfiles%\Microsoft UCMA 3.0SDK\Core\Bin\Microsoft.Rtc.Collaboration.AudioVideo.VoiceXml.dll

上記の VoiceXML を読み込む処理を記述します。今回も、下記の太字以外の部分は、今までと変わりませんので、太字の部分だけ読んでみてください。(上記で説明した手順さえ理解しておけば、それほど難しい処理はしていません。)
なお、今回は、UCMA アプリケーションから通知 (Notify) するのではなく、UCMA アプリケーションが、ユーザーからの AudioVideoCall を受信 (Receive) して処理するようにしています。

. . .using Microsoft.Rtc.Collaboration.AudioVideo;using Microsoft.Rtc.Collaboration.AudioVideo.VoiceXml;using Microsoft.Speech.VoiceXml.Common;. . .static void Main(string[] args){Program prog = new Program();prog.Run();}private CollaborationPlatform rtcPlatform;private ApplicationEndpoint appEndpoint;private AudioVideoCall currentCall;private Browser vxmlBrowser;public void Run(){// Start platform and establish endpointProvisionedApplicationPlatformSettings settings = new ProvisionedApplicationPlatformSettings("IVRSample", "urn:application:ivrsample");rtcPlatform = new CollaborationPlatform(settings);rtcPlatform.RegisterForApplicationEndpointSettings(this.ApplicationEndpointSettingsDiscovered);rtcPlatform.BeginStartup(this.PlatformStartupCompleted, null);// Terminate endpoint and shutdown platformConsole.WriteLine("Press enter to shutdown ...");Console.ReadLine();appEndpoint.BeginTerminate(this.EndpointTerminateCompleted, null);}public void ApplicationEndpointSettingsDiscovered(object sender,ApplicationEndpointSettingsDiscoveredEventArgs e){Console.WriteLine("Called applicationEndpointSettingsDiscovered ...");ApplicationEndpointSettings settings = e.ApplicationEndpointSettings;settings.UseRegistration = true;settings.Presence.Description = "This is IVR sample.";appEndpoint = new ApplicationEndpoint(rtcPlatform, settings);appEndpoint.RegisterForIncomingCall<AudioVideoCall>(AudioVideoReceived);appEndpoint.BeginEstablish(this.EndpointEstablishCompleted, null);}public void PlatformStartupCompleted(IAsyncResult res){rtcPlatform.EndStartup(res);Console.WriteLine("Completed platform startup ...");}public void EndpointEstablishCompleted(IAsyncResult res){appEndpoint.EndEstablish(res);Console.WriteLine("Completed endpoint establish ...");}public void EndpointTerminateCompleted(IAsyncResult res){appEndpoint.EndTerminate(res);Console.WriteLine("Completed endpoint terminate ...");appEndpoint.UnregisterForIncomingCall<AudioVideoCall>(AudioVideoReceived);rtcPlatform.BeginShutdown(this.PlatformShutdownCompleted, null);}public void PlatformShutdownCompleted(IAsyncResult res){rtcPlatform.EndShutdown(res);Console.WriteLine("Completed platform shutdown ...");}public void AudioVideoReceived(object sender,CallReceivedEventArgs<AudioVideoCall> e){currentCall = e.Call;// Accept this callcurrentCall.EndAccept(currentCall.BeginAccept(null, null));// Wait AudioVideo Flow activatedwhile (currentCall.Flow.State != MediaFlowState.Active)Thread.Sleep(1000);// Create VoiceXML Browser and initializevxmlBrowser = new Browser();vxmlBrowser.SessionCompleted +=new EventHandler<Microsoft.Speech.VoiceXml.Common.SessionCompletedEventArgs>(vxmlBrowser_SessionCompleted);// Execute IVR !vxmlBrowser.SetAudioVideoCall(currentCall);VoiceXmlResult vr = vxmlBrowser.Run(new Uri(@"c:\Demo\test.vxml"), null);// Get results and outputif (vr != null && vr.Namelist != null){foreach (string key in vr.Namelist.Keys)Console.WriteLine("Key:{0} Value:{1}",key,vr.Namelist[key].ToString());}  // Terminate VoiceXml BrowservxmlBrowser.Dispose();}public void vxmlBrowser_SessionCompleted(object sender, Microsoft.Speech.VoiceXml.Common.SessionCompletedEventArgs e){Console.WriteLine("vxmlBrowser_SessionCompleted");}. . .

この UCMA アプリケーションに通話をおこなうと、以下の通り、UCMA アプリケーションと対話がおこなわれます。

UCMA Application 「問い合わせの内容を、技術、セールス、ライセンス、その他の中から選択してください」

Lync の利用者 「ライセンス」

UCMA Application 「ライセンスが選択されました」

対話の完了後、コンソールには、下記の通り表示されます。
下記の通り、ここでは、単に選択結果をコンソールに出力していますが、現実のアプリケーションでは、選択した内容によって、必要なオペレーターに接続 (転送) するなど、さまざまな処理が構築できます。

Press enter to shutdown ...Completed platform startup ...Called applicationEndpointSettingsDiscovered ...Completed endpoint establish ...Key:QuestionCategory Value:ライセンスKey:QuestionCategory$.utterance Value:ライセンスvxmlBrowser_SessionCompleted

「Lync Online で不可能」と言うとイントラネットでしか使えないように誤解されるかもしれませんが、ここで紹介した手法は、もちろん、Enterprise Voice を使って、一般の電話などから接続することができます。(音声を使用する場合には、電話番号を入力させるなどして連携すると良いでしょう。) また、セミナーでご説明したように、Lync クライアントを使わず、Edge Server を経由して、インターネット上のカスタム Web アプリケーションと UCMA を組み合わせることもできます。(受信側のコールセンターでは、前半のデモでご紹介した CWE なども組み合わせられます。) Google や Bing のように、一部の機能 (TTS の処理など) を Web サービスとして公開して連携させるような使い方も考えられます。また、Lync Online と Federation を構成している場合は、Lync Online とこうした Lync のオンプレミスの環境どうしの接続も可能です。
Lync Server があれば、インターネットシナリオでも、高度なサービスを広範囲に提供し、デモでご紹介したように、Windows Azure 上のサービスなどに Voice (電話など) のためのサービスを組み合わせることができます。

補足 : Windows Phone 8では、Voice Command による音声認識 (speech recognition)、音声合成 (speech synthesis) が可能です。(2012/12/04 追記: こちらを参照)

この内容やデモは、09/10、東京のわんくまさんで、再度、お話します。

Categories: Uncategorized

Tagged as: Office, Server

tsmatz

Professional Development, Data Science

UCMA を使った音声合成・音声認識の開発 (Speech / Voice / TTS / ASR / IVR etc)

1 reply»

Leave a Reply Cancel reply

Recent Posts

Reinforcement Learning

Imitation Learning

Language Processing

Diffusion Models

Tags

Follow

UCMA を使った音声合成・音声認識の開発 (Speech / Voice / TTS / ASR / IVR etc)

Share this:

Related

1 reply»

Leave a Reply Cancel reply

Recent Posts

Reinforcement Learning

Imitation Learning

Language Processing

Diffusion Models

Tags

Follow