Web speech API

Tags:
  • JavaScript
  • Web Speech API
  • Web Development

Introduction

I've always been fascinated by voice recognition and text-to-speech technologies. Back in 2016, I built an app that could recognize speech and perform actions based on the spoken commands. You can find this old video on YouTube. Since then I have been experimenting with different technologies and APIs to achieve similar results, but for all platforms and devices instead of just Windows. My favorite platform by far is the web, and that's why I'm going to show you how to use the Web Speech API to create a simple speech recognition input field.

What is the Web Speech API?

Web speech API is a JavaScript API that allows you to incorporate voice data into web apps. This is a powerful tool that can be used to create voice-controlled applications, speech recognition, and text-to-speech synthesis. While on it's own it may not be as powerful as some of the more advanced speech recognition systems, it is a great starting point for developers who want to experiment with voice data in their web applications.

There are many ways you can achieve this, but in this article, we will explore the browser's built-in Web Speech API. This API is supported by most modern browsers, including Chrome, Firefox, and Safari. We will also cover handling an error when the browser does not support the Web Speech API.

Getting started

We're going to build something that looks like the image shown below, where you can click on the microphone button and speak into your microphone. The text you speak will be displayed in the input field but obviously, you can extend this to do whatever you want with the spoken text. There are no frameworks or libraries involved, just plain HTML and JavaScript.

Speech recognition input

Using the Web Speech API

First of all, let's check if the browser supports the Web Speech API.

const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition;

If the SpeechRecognition object is not available, it means that the browser does not support the Web Speech API. It could be possible that the user is using an older browser or the API is not supported in the current environment. You can see if the API is supported in the browser by checking caniuse.com.

Assuming we have a button placed somewhere in the HTML with the ID mic-btn and an input field with the ID speech-input, we can start listening for speech input.

// Initialize the recognition object
const recognition = new SpeechRecognition();

// Configure the recognition object
recognition.continuous = false;
recognition.interimResults = false;

// Add an event listener to the microphone button
micBtn.addEventListener('click', () => {
  recognition.start();
  console.log('Listening...');
});

// Handle the speech recognition result
recognition.onresult = (event) => {
  const transcript = event.results[0][0].transcript;
  speechInput.value = transcript;
};

// Handle speech recognition errors
recognition.onerror = (event) => {
  console.error('Speech recognition error:', event.error);
};

// Handle the end of speech recognition
recognition.onend = () => {
  console.log('Speech recognition service disconnected');
};

This event driven API is quite simple to use. When the user clicks the microphone button, the recognition.start() method is called, which starts listening for speech input. When the user stops speaking, the recognition.onresult event is triggered, and the spoken text is displayed in the input field. If there is an error during the speech recognition process, the recognition.onerror event is triggered. Finally, when the speech recognition service is disconnected, the recognition.onend event is triggered, you can use this event to do any cleanup or additional processing.

Styling the input field

Since I'm a big fan of tailwindcss and I will never be a proper CSS developer, I'm going to use tailwind to style the input field and the microphone button.

<input
  type="text"
  id="speech-input"
  class="pl-10 pr-4 py-2 border border-gray-300 rounded-md shadow-sm focus:outline-none focus:ring-2 focus:ring-blue-500 focus:border-blue-500"
  placeholder="Speak something..."
/>
<button id="mic-btn" class="absolute left-4 top-2 text-gray-500 hover:text-gray-700 focus:outline-none">
  <i class="fas fa-microphone"></i>
</button>

As you can see, for icons I'm using the Font Awesome CDN which a very fast and easy way to include icons in your project. Obviously, you can use any other icon library or even SVG icons if you prefer.

The final product

Everything put together in an HTML file would look like this:

<!doctype html>
<html lang="en">
  <head>
    <meta charset="UTF-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    <title>Speech Recognition Input</title>
    <script src="https://cdn.tailwindcss.com"></script>
    <script src="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.5.2/js/all.min.js" crossorigin="anonymous"></script>
  </head>
  <body class="flex items-center justify-center h-screen bg-gray-100">
    <div class="relative">
      <input
        type="text"
        id="speech-input"
        class="pl-10 pr-4 py-2 border border-gray-300 rounded-md shadow-sm focus:outline-none focus:ring-2 focus:ring-blue-500 focus:border-blue-500"
        placeholder="Speak something..."
      />
      <button id="mic-btn" class="absolute left-4 top-2 text-gray-500 hover:text-gray-700 focus:outline-none">
        <i class="fas fa-microphone"></i>
      </button>
    </div>
    <script>
      // JavaScript for handling speech recognition
      const micBtn = document.getElementById('mic-btn');
      const speechInput = document.getElementById('speech-input');

      // Check for browser support
      const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition;
      if (!SpeechRecognition) {
        console.log('Speech Recognition API not supported in this browser.');
      } else {
        const recognition = new SpeechRecognition();
        recognition.continuous = false;
        recognition.interimResults = false;

        micBtn.addEventListener('click', () => {
          recognition.start();
          console.log('Listening...');
        });

        recognition.onresult = (event) => {
          const transcript = event.results[0][0].transcript;
          speechInput.value = transcript;
        };

        recognition.onerror = (event) => {
          console.error('Speech recognition error:', event.error);
        };

        recognition.onend = () => {
          console.log('Speech recognition service disconnected');
        };
      }
    </script>
  </body>
</html>