{"id":73228,"date":"2025-07-17T20:13:20","date_gmt":"2025-07-17T14:43:20","guid":{"rendered":"https:\/\/www.tothenew.com\/blog\/?p=73228"},"modified":"2025-07-30T13:09:05","modified_gmt":"2025-07-30T07:39:05","slug":"custom-voice-recognition-in-roku-app-using-google-speech-to-text-library","status":"publish","type":"post","link":"https:\/\/www.tothenew.com\/blog\/custom-voice-recognition-in-roku-app-using-google-speech-to-text-library\/","title":{"rendered":"Custom voice recognition in Roku app using Google Speech to text library"},"content":{"rendered":"<p>Roku provides a library that supports basic voice commands, such as \u201cplay\u201d, \u201cpause\u201d, \u201cnext\u201d, and \u201cfast forward\u201d, Roku also provides a keyboard integrated with voice search. However, we cannot integrate custom voice commands directly from Roku\u2019s voice library. Through this article, we will cover how to integrate custom voice commands into a Roku channel using the \u201croMicrophone\u201d component and a third-party voice recognition library, such as the Google Speech-to-Text API.<\/p>\n<p>To use the voice recognition service, the Roku device must be running on OS 7.6 or later.<\/p>\n<p>&nbsp;<\/p>\n<h1>Prerequisites<\/h1>\n<ol>\n<li>Knowledge of Roku app development using Brightscript\/Scenegraph.<\/li>\n<li>Google Cloud Account with billing enabled.<\/li>\n<li>Roku device with developer mode enabled and had OS version 7.6 or later.<\/li>\n<\/ol>\n<p>&nbsp;<\/p>\n<h1>Integration Steps<\/h1>\n<p>Custom Voice command integration will be done in 2 main steps.<\/p>\n<ol>\n<li>Google Cloud Speech-to-Text API Setup.<\/li>\n<li>Roku Channel code setup.<\/li>\n<\/ol>\n<p>&nbsp;<\/p>\n<h2>Setting up the Google Speech-to-Text API<\/h2>\n<ol>\n<li>Firstly, we need to create a <a href=\"https:\/\/console.cloud.google.com\/\">Google Cloud account<\/a>. Sign in to the Google Cloud console and enable billing.<\/li>\n<li>Create a new project, give a name as per your preference.<\/li>\n<li>Open \u201cAPIs and Services\u201d and select the Cloud Speech-to-Text\u201d API.<\/li>\n<li>Click on the \u201cEnable\u201d button to enable the Speech-to-Text API.<\/li>\n<li>Create an <a href=\"https:\/\/console.cloud.google.com\/apis\/credentials?project=_\">API key<\/a>.<\/li>\n<\/ol>\n<div id=\"attachment_73226\" style=\"width: 635px\" class=\"wp-caption aligncenter\"><img aria-describedby=\"caption-attachment-73226\" decoding=\"async\" loading=\"lazy\" class=\"wp-image-73226 size-large\" src=\"https:\/\/www.tothenew.com\/blog\/wp-ttn-blog\/uploads\/2025\/07\/voice_blog_1-1024x594.png\" alt=\"custom voice command in roku\" width=\"625\" height=\"363\" srcset=\"\/blog\/wp-ttn-blog\/uploads\/2025\/07\/voice_blog_1-1024x594.png 1024w, \/blog\/wp-ttn-blog\/uploads\/2025\/07\/voice_blog_1-300x174.png 300w, \/blog\/wp-ttn-blog\/uploads\/2025\/07\/voice_blog_1-768x446.png 768w, \/blog\/wp-ttn-blog\/uploads\/2025\/07\/voice_blog_1-1536x891.png 1536w, \/blog\/wp-ttn-blog\/uploads\/2025\/07\/voice_blog_1-624x362.png 624w, \/blog\/wp-ttn-blog\/uploads\/2025\/07\/voice_blog_1.png 1672w\" sizes=\"(max-width: 625px) 100vw, 625px\" \/><p id=\"caption-attachment-73226\" class=\"wp-caption-text\">Google Cloud Console<\/p><\/div>\n<div id=\"attachment_73227\" style=\"width: 635px\" class=\"wp-caption aligncenter\"><img aria-describedby=\"caption-attachment-73227\" decoding=\"async\" loading=\"lazy\" class=\"wp-image-73227 size-large\" src=\"https:\/\/www.tothenew.com\/blog\/wp-ttn-blog\/uploads\/2025\/07\/voice_blog_2-1024x567.png\" alt=\"Cloud Speech to text\" width=\"625\" height=\"346\" srcset=\"\/blog\/wp-ttn-blog\/uploads\/2025\/07\/voice_blog_2-1024x567.png 1024w, \/blog\/wp-ttn-blog\/uploads\/2025\/07\/voice_blog_2-300x166.png 300w, \/blog\/wp-ttn-blog\/uploads\/2025\/07\/voice_blog_2-768x426.png 768w, \/blog\/wp-ttn-blog\/uploads\/2025\/07\/voice_blog_2-624x346.png 624w, \/blog\/wp-ttn-blog\/uploads\/2025\/07\/voice_blog_2.png 1377w\" sizes=\"(max-width: 625px) 100vw, 625px\" \/><p id=\"caption-attachment-73227\" class=\"wp-caption-text\">Cloud Speech to Text<\/p><\/div>\n<h2><\/h2>\n<h2><\/h2>\n<h2>Code Setup<\/h2>\n<ol>\n<li>Create a custom component(that is responsible for listening and returning the text) named as <strong><span style=\"color: #993366;\">VoiceRecognizer.xml<\/span><\/strong><\/li>\n<li>Create a &#8220;roMicrophone&#8221; object to access the Roku remote&#8217;s mic.\n<ul style=\"list-style-type: square;\">\n<li>\n<pre>microphone = createObject(\"roMicrophone\")<\/pre>\n<\/li>\n<\/ul>\n<\/li>\n<li>To use the microphone, we need to set the port first with the help of the <span style=\"color: #800000;\">SetMessagePort<\/span>(port) function.\n<ul style=\"list-style-type: square;\">\n<li>\n<pre>port = CreateObject(\"roMessagePort\")\r\nmicrophone.SetMessagePort(port)<\/pre>\n<\/li>\n<\/ul>\n<\/li>\n<li>After that, we can initiate the recording using Roku&#8217;s microphone by using the <span style=\"color: #800000;\">StartRecording<\/span>() function.\n<ul style=\"list-style-type: square;\">\n<li>\n<pre>microphone.StartRecording()<\/pre>\n<\/li>\n<\/ul>\n<\/li>\n<li>Now, we need to create the &#8220;roByteArray&#8221; object to store the audio data in the bytes.\n<ul style=\"list-style-type: square;\">\n<li>\n<pre>audioBytes = CreateObject(\"roByteArray\")<\/pre>\n<\/li>\n<\/ul>\n<\/li>\n<li>Capture the microphone events to get the audio data.\n<ul style=\"list-style-type: square;\">\n<li>\n<pre>while true\u00a0\r\n     micEvent = wait(0, port)\r\n     if micEvent.IsRecordingInfo() 'when startRecording() function calls this event loop triggers\r\n\u00a0\r\n          info = micEvent.GetInfo()\r\n\u00a0\r\n          audioBytes.append(info.sample_data)\r\n\u00a0\r\n     else \r\n          exit while\r\n\u00a0\r\n     end if\r\n\u00a0\r\nend while<\/pre>\n<\/li>\n<\/ul>\n<\/li>\n<li>Now we have the audio data in the byteArray, we need to convert this byteArray to a base64String.\n<ul style=\"list-style-type: square;\">\n<li>\n<pre>audioData = audioBytes.ToBase64String()<\/pre>\n<\/li>\n<\/ul>\n<\/li>\n<li>Create the request body and headers for calling the API.\n<ul style=\"list-style-type: square;\">\n<li><strong>headers<\/strong>\n<ul style=\"list-style-type: square;\">\n<li>\n<pre>{\r\n\"X-Goog-Api-Key\": Your_API_Key\r\n\"Content-Type\": \"application\/json; charset=utf-8\"\r\n}<\/pre>\n<\/li>\n<\/ul>\n<\/li>\n<li><strong>body <\/strong>\n<ul style=\"list-style-type: square;\">\n<li>\n<pre>{\r\n\"audio\": {\r\n     \"content\": audioData\r\n},\r\n\r\n\"config\": {\r\n\r\n\u00a0 \u00a0 \"enableAutomaticPunctuation\": true,\u00a0\r\n\r\n\u00a0 \u00a0 \"encoding\": \"LINEAR16\",\r\n\r\n\u00a0 \u00a0 \"languageCode\": \"en-US\",\u00a0\r\n\r\n  \u00a0 \"sampleRateHertz\": 16000\r\n}\r\n}<\/pre>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Now create a roUrlTransfer object and send this data to the Google Speech to Text library via the below <a href=\"https:\/\/speech.googleapis.com\/v1p1beta1\/speech:recognize\">API<\/a>.\n<ul>\n<li><strong>API curl<\/strong>\n<pre>API Url: https:\/\/speech.googleapis.com\/v1p1beta1\/speech:recognize\r\n\r\nHeaders: {\r\n\"X-Goog-Api-Key\": Your_API_Key\r\n\"Content-Type\": \"application\/json; charset=utf-8\"\r\n}\r\n\r\nBody: body 'above mentioned body in point 8<\/pre>\n<\/li>\n<\/ul>\n<\/li>\n<li>Create an observer function that observes the response of the API call. In our case, the observer function is <span style=\"color: #800000;\">onGetTextResponse<\/span>().\n<ul style=\"list-style-type: square;\">\n<li>\n<pre>function onGetTextResponse()\r\nresults = m.top.textResponse.response.results\r\n\u00a0\r\ntext = \"\"\r\n\u00a0\r\nfor each result in results\r\n\u00a0\r\nfor each alternative in result.alternatives\r\n\u00a0\r\ntext += alternative.text \r\n\u00a0\r\nend for\r\n\u00a0\r\nend for\r\n\r\nm.SpeechText = text\r\nend function<\/pre>\n<\/li>\n<\/ul>\n<\/li>\n<li>m.speechText is the text of that audio command we give using the Roku&#8217;s microphone.<\/li>\n<li>Now, through this m.speechText we can perform the actions to our app. For example, if user said &#8220;Buy this Order&#8221; and we handle the check on this text we can perform the action onClick(buttonSelected) event of the &#8220;Buy Now&#8221; button.<\/li>\n<\/ol>\n<p>&nbsp;<\/p>\n<h1>Conclusion<\/h1>\n<p>Custom voice recognition feature enables ease of user experience, User can able to perform several actions through his\/her voice, these actions helps to do hands-free navigation, accessibility, and many more. While Roku doesn&#8217;t support custom voice commands through his voice library beyond basic voice commands, this hybrid approach combining Roku&#8217;s microphone input with cloud-based transcription bridges that gap effectively.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Roku provides a library that supports basic voice commands, such as \u201cplay\u201d, \u201cpause\u201d, \u201cnext\u201d, and \u201cfast forward\u201d, Roku also provides a keyboard integrated with voice search. However, we cannot integrate custom voice commands directly from Roku\u2019s voice library. Through this article, we will cover how to integrate custom voice commands into a Roku channel using [&hellip;]<\/p>\n","protected":false},"author":1526,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"iawp_total_views":35},"categories":[3477],"tags":[3474,3629],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts\/73228"}],"collection":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/users\/1526"}],"replies":[{"embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/comments?post=73228"}],"version-history":[{"count":12,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts\/73228\/revisions"}],"predecessor-version":[{"id":73730,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts\/73228\/revisions\/73730"}],"wp:attachment":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/media?parent=73228"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/categories?post=73228"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/tags?post=73228"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}