Overview
The AssemblyAI Transcriber node transcribes audio files into text using the AssemblyAI API. It supports transcription from either a publicly accessible URL or binary audio data passed from a previous node. This node is useful for automating the conversion of spoken content into text, enabling applications such as meeting transcription, podcast captioning, voice note processing, and accessibility improvements.
Typical use cases include:
- Transcribing recorded interviews or meetings to generate searchable text.
- Converting podcasts or webinars into text for content repurposing.
- Processing customer support calls for quality assurance or analytics.
- Extracting text from voice memos or dictations.
Properties
| Name | Meaning |
|---|---|
| Source | Choose where the audio comes from: either a public URL or binary data from a previous node. |
| Audio URL | The publicly accessible URL of the audio file to transcribe (required if Source is URL). |
| Input Binary Field | The name of the binary property containing the audio data (required if Source is Binary Data). |
| Speech Model | Select the speech model tier: "Best" for highest accuracy or "Nano" for lower cost but less accurate transcription. |
| Output Format | Choose output format: "Full Response" returns the complete AssemblyAI response object; "Transcript Text Only" outputs just the transcript text. |
| Transcript Field Name | The JSON field name to store the transcript text when "Transcript Text Only" output format is selected. |
| Additional Options | Collection of optional parameters to customize transcription behavior, including: language code, automatic language detection with confidence threshold, speaker diarization settings, punctuation, formatting, multichannel transcription, word boosting, profanity filtering, PII redaction (with options for audio redaction and quality), disfluencies inclusion, audio segment start/end times, speech threshold, and webhook configuration for asynchronous notifications. |
Output
The node outputs an array of items corresponding to each input item processed. Each output item contains:
If Output Format is set to Full Response:
assemblyAiTranscription: The full transcription response object returned by AssemblyAI, including fields like transcript text, confidence scores, audio duration, status, etc.assemblyAiMetadata: Metadata about the transcription request such as speech model used, options applied, transcription duration in milliseconds, source type (URL or binary), and if binary, file info (name, MIME type, size).- Original input JSON fields are preserved.
If Output Format is set to Transcript Text Only:
- A single JSON field (customizable name) containing only the transcript text.
- A nested
metadataobject with details like transcription duration, speech model, audio duration, confidence, source type, and file info if applicable.
If the node encounters errors during processing, it outputs an error message per item if "Continue On Fail" is enabled.
The node does not output binary data itself but can process binary audio inputs.
Dependencies
- Requires an active AssemblyAI API key credential configured in n8n.
- Uses the official AssemblyAI SDK for JavaScript to interact with the API.
- Network access to AssemblyAI endpoints and to any provided audio URLs is necessary.
- Optional webhook URLs can be configured for asynchronous transcription completion notifications.
Troubleshooting
- Missing Audio URL or Binary Data: If the source is URL but no valid URL is provided, the node throws an error indicating the missing audio URL. Ensure the URL is publicly accessible.
- Invalid Binary Property: When using binary data, the specified binary property must exist and contain valid audio data. Otherwise, an error will occur.
- API Errors: If AssemblyAI returns an error status, the node surfaces the error message. Common causes include invalid API keys, unsupported audio formats, or exceeding usage limits.
- Unsupported Operation: The node currently supports only the "Transcribe Audio" operation. Selecting other operations results in an error.
- Webhook Configuration: If using webhooks, ensure the URL is reachable and properly secured if authentication headers are set.
- Speech Model Selection: Using the "Nano" model may result in less accurate transcripts; choose based on your accuracy vs cost needs.
- Language Detection: Enabling automatic language detection requires setting an appropriate confidence threshold to avoid misclassification.