Web APIs

JS Web Speech API

Speech Synthesis and Recognition

JavaScript Web Speech API enables speech synthesis and recognition.

Introduction to Web Speech API

The JavaScript Web Speech API provides developers with the ability to incorporate speech recognition and synthesis capabilities into web applications. This API is divided into two parts:

  • SpeechSynthesis: Converts text into speech.
  • SpeechRecognition: Converts spoken language into text.

In this guide, we will explore how each part works with practical examples.

Speech Synthesis with SpeechSynthesis

Speech synthesis, also known as text-to-speech (TTS), allows applications to read out text using a computer-generated voice. The SpeechSynthesis interface is used for this purpose.

Here's a basic example of using speech synthesis:

In this example, we create a new SpeechSynthesisUtterance instance with the text we want to be spoken. The window.speechSynthesis.speak() method is then used to play the speech.

Customizing Speech Attributes

You can customize the voice, pitch, and rate of the speech. Here's how you can do it:

In this example, we select a voice from the list of available voices, and adjust the pitch and rate. It is important to note that the availability of voices depends on the user's system and browser.

Speech Recognition with SpeechRecognition

Speech recognition allows users to interact with applications through voice commands. The SpeechRecognition interface is used to capture and process spoken words.

Below is a simple example to get started with speech recognition:

In this code, we create an instance of SpeechRecognition and define an onresult event handler to handle the recognized speech. The recognition.start() method begins the speech recognition process.

Handling Speech Recognition Events

The SpeechRecognition interface provides several events that can be used to handle different stages of the recognition process. Some useful events include:

  • onstart: Fired when speech recognition starts.
  • onspeechend: Fired when the user stops speaking.
  • onerror: Fired when an error occurs during recognition.
  • onaudioend: Fired when the recognition audio stream ends.

By handling these events, you can provide a more robust user experience in your applications.