Speech STT Node

Convert speech to text.

Overview

This node converts speech audio data into text using a speech-to-text (STT) service. It accepts base64 encoded audio input and sends it to an external API that performs the transcription. The node is useful in scenarios where you want to automate the extraction of spoken content from audio files, such as transcribing voice messages, podcasts, or recorded meetings.

Practical examples include:

Transcribing customer support calls for analysis.
Converting voice notes into searchable text.
Automating subtitles generation for videos.

Properties

Name	Meaning
Audio	Base64 encoded audio data to be transcribed

Output

The node outputs JSON data containing the transcription result returned by the external STT API. Each item in the output corresponds to one input item and includes the full response body from the API under the json field.

If an error occurs during the request and the node is configured to continue on failure, the output will contain an error field with the error message instead.

The node does not output binary data.

Dependencies

Requires an API token credential for authentication with the external speech-to-text service.
Requires the API domain URL to send requests to the correct endpoint.
The node makes HTTP POST requests to the /speech/stt endpoint of the configured API domain.
Proper network connectivity to the external API is necessary.

Troubleshooting

Request errors: If the API request fails (e.g., due to invalid credentials, network issues, or malformed audio data), the node logs the error and throws an application error unless "Continue On Fail" is enabled.
Invalid audio data: Ensure the audio input is correctly base64 encoded; otherwise, the API may reject the request.
Authentication failures: Verify that the provided API token is valid and has the required permissions.
API domain misconfiguration: Confirm the API domain URL is correct and reachable.

Links and References

Refer to your speech-to-text service provider's API documentation for details on accepted audio formats, request limits, and authentication methods.