Translate speech - Virtual IoT Device
In this part of the lesson, you will write code to translate speech when converting to text using the speech service, then translate text using the Translator service before generating a spoken response.
Use the speech service to translate speech
The speech service can take speech and not only convert to text in the same language, but also translate the output to other languages.
Task - use the speech service to translate speech
-
Open the
smart-timer
project in VS Code, and ensure the virtual environment is loaded in the terminal. -
Add the following import statements below the existing imports:
from azure.cognitiveservices import speech
from azure.cognitiveservices.speech.translation import SpeechTranslationConfig, TranslationRecognizer
import requestsThis imports classes used to translate speech, and a
requests
library that will be used to make a call to the Translator service later in this lesson. -
Your smart timer will have 2 languages set - the language of the server that was used to train LUIS (the same language is also used to build the messages to speak to the user), and the language spoken by the user. Update the
language
variable to be the language that will be spoken by the user, and add a new variable calledserver_language
for the language used to train LUIS:language = '<user language>'
server_language = '<server language>'Replace
<user language>
with the locale name for language you will be speaking in, for examplefr-FR
for French, orzn-HK
for Cantonese.Replace
<server language>
with the locale name for language used to train LUIS.You can find a list of the supported languages and their locale names in the Language and voice support documentation on Microsoft docs.
💁 If you don't speak multiple languages you can use a service like Bing Translate or Google Translate to translate from your preferred language to a language of your choice. These services can then play audio of the translated text. Be aware that the speech recognizer will ignore some audio output from your device, so you may need to use an additional device to play the translated text.
For example, if you train LUIS in English, but want to use French as the user language, you can translate sentences like "set a 2 minute and 27 second timer" from English into French using Bing Translate, then use the Listen translation button to speak the translation into your microphone.
-
Replace the
recognizer_config
andrecognizer
declarations with the following:translation_config = SpeechTranslationConfig(subscription=speech_api_key,
region=location,
speech_recognition_language=language,
target_languages=(language, server_language))
recognizer = TranslationRecognizer(translation_config=translation_config)This creates a translation config to recognize speech in the user language, and create translations in the user and server language. It then uses this config to create a translation recognizer - a speech recognizer that can translate the output of the speech recognition into multiple languages.
💁 The original language needs to be specified in the
target_languages
, otherwise you won't get any translations. -
Update the
recognized
function, replacing the entire contents of the function with the following:if args.result.reason == speech.ResultReason.TranslatedSpeech:
language_match = next(l for l in args.result.translations if server_language.lower().startswith(l.lower()))
text = args.result.translations[language_match]
if (len(text) > 0):
print(f'Translated text: {text}')
message = Message(json.dumps({ 'speech': text }))
device_client.send_message(message)This code checks to see if the recognized event was fired because speech was translated (this event can fire at other times, such as when the speech is recognized but not translated). If the speech was translated, it finds the translation in the
args.result.translations
dictionary that matches the server language.The
args.result.translations
dictionary is keyed off the language part of the locale setting, not the whole setting. For example, if you request a translation intofr-FR
for French, the dictionary will contain an entry forfr
, notfr-FR
.The translated text is then sent to the IoT Hub.
-
Run this code to test the translations. Ensure your function app is running, and request a timer in the user language, either by speaking that language yourself, or using a translation app.
(.venv) ➜ smart-timer python app.py
Connecting
Connected
Translated text: Set a timer of 2 minutes and 27 seconds.
Translate text using the translator service
The speech service doesn't support translation of text back to speech, instead you can use the Translator service to translate the text. This service has a REST API you can use to translate the text.
Task - use the translator resource to translate text
-
Add the translator API key below the
speech_api_key
:translator_api_key = '<key>'
Replace
<key>
with the API key for your translator service resource. -
Above the
say
function, define atranslate_text
function that will translate text from the server language to the user language:def translate_text(text):
-
Inside this function, define the URL and headers for the REST API call:
url = f'https://api.cognitive.microsofttranslator.com/translate?api-version=3.0'
headers = {
'Ocp-Apim-Subscription-Key': translator_api_key,
'Ocp-Apim-Subscription-Region': location,
'Content-type': 'application/json'
}The URL for this API is not location specific, instead the location is passed in as a header. The API key is used directly, so unlike the speech service there is no need to get an access token from the token issuer API.
-
Below this define the parameters and body for the call:
params = {
'from': server_language,
'to': language
}
body = [{
'text' : text
}]The
params
defines the parameters to pass to the API call, passing the from and to languages. This call will translate text in thefrom
language into theto
language.The
body
contains the text to translate. This is an array, as multiple blocks of text can be translated in the same call. -
Make the call the REST API, and get the response:
response = requests.post(url, headers=headers, params=params, json=body)
The response that comes back is a JSON array, with one item that contains the translations. This item has an array for translations of all the items passed in the body.
[
{
"translations": [
{
"text": "Chronométrant votre minuterie de 2 minutes 27 secondes.",
"to": "fr"
}
]
}
] -
Return the
test
property from the first translation from the first item in the array:return response.json()[0]['translations'][0]['text']
-
Update the
say
function to translate the text to say before the SSML is generated:print('Original:', text)
text = translate_text(text)
print('Translated:', text)This code also prints the original and translated versions of the text to the console.
-
Run your code. Ensure your function app is running, and request a timer in the user language, either by speaking that language yourself, or using a translation app.
(.venv) ➜ smart-timer python app.py
Connecting
Connected
Translated text: Set a timer of 2 minutes and 27 seconds.
Original: 2 minute 27 second timer started.
Translated: 2 minute 27 seconde minute a commencé.
Original: Times up on your 2 minute 27 second timer.
Translated: Chronométrant votre minuterie de 2 minutes 27 secondes.