Actions4
Overview
This node integrates with an AI-powered podcast-style audio generation service to create spoken audio content from various input types. It supports generating monologue or dialogue audio tracks based on direct text, URLs, topics, scripts, or uploaded files. The node is useful for automating the creation of podcasts, audiobooks, voiceovers, or any audio narration where natural-sounding speech synthesis is desired.
Common scenarios include:
- Converting blog posts or website content into audio.
- Generating scripted or research-based audio narrations.
- Producing conversational dialogues with two distinct voices.
- Creating audio from uploaded text files or base64 encoded content.
Practical example:
- A content creator inputs a topic and requests a medium-length expert-level monologue in English (US) to generate a podcast episode automatically.
- A marketing team provides a pre-written script and selects dialogue mode with host and co-host voices to produce a conversational ad spot.
Properties
| Name | Meaning |
|---|---|
| Input Type | Method of providing content: File upload, Bring-your-own-script, Direct text input, Research-based generation by topic, or Website URL to scrape. |
| Mode | Audio style: Monologue (single voice narration) or Dialogue (two-voice conversation). |
| Text Content | The raw text to convert to audio (used when Input Type is "text"). |
| Website URL | URL of the website to scrape and convert to audio (used when Input Type is "url"). |
| Topic | Subject to research and generate content about (used when Input Type is "topic"). |
| Script Content | Pre-written script content provided by the user (used when Input Type is "script"). |
| File URLs or Base64 | One or more file URLs (https:// or gs://) or base64 encoded file content, one per line (used when Input Type is "file"). |
| Voice Name or ID | Selected voice for monologue mode; loaded dynamically based on chosen language. |
| Host Voice Name or ID | Selected host voice for dialogue mode; loaded dynamically based on chosen language. |
| Co-Host Voice Name or ID | Selected co-host voice for dialogue mode; loaded dynamically based on chosen language. |
| Content Level | Complexity level of generated content: Beginner, Intermediate, or Expert (not applicable for script input). |
| Content Length | Desired length of generated content: Short, Medium, or Long (not applicable for script input). |
| Language | Language and locale for audio generation (e.g., English US, French France, Japanese Japan). |
| Emphasis | Focus area or emphasis for content generation (optional, not applicable for script input). |
| Questions To Be Answered | Include Q&A segments in the generated content (optional, not applicable for script input). |
| Additional Instructions | Custom instructions to guide content generation (optional, not applicable for script input). |
| Additional Fields | Collection of optional fields including: - Co-Host Voice Instructions (dialogue mode) - Host Voice Instructions (dialogue mode) - Read Mode (boolean, monologue only, for direct text reading) - Voice Instructions (monologue mode) |
Output
The node outputs JSON data representing the response from the audio generation API. This typically includes:
- Task identifiers for tracking audio generation status.
- Metadata about the generated audio such as URLs, duration, and voice details.
- Status information for asynchronous operations like task completion.
- Lists of available voices organized by language.
If binary audio data is returned or referenced, it will be represented as URLs or URIs pointing to the generated audio files rather than embedded binary content.
Dependencies
- Requires an API key credential for the external AI podcast/audio generation service.
- The node uses authenticated HTTP requests to the service's REST API endpoints.
- Dynamic loading of available voices depends on the selected language.
- Proper configuration of the API base URL and authentication token in n8n credentials is necessary.
Troubleshooting
- Voice Mixing Error: When using dialogue mode, if the selected host and co-host voices come from different providers that cannot be mixed, the node throws an error instructing to select compatible voices. Solution: Choose voices from the same provider or those that support mixing.
- Task Timeout: Waiting for audio generation completion may time out if the process takes longer than the specified timeout. Increase the timeout or check the service status.
- Invalid Input Data: Missing required parameters for the chosen input type (e.g., no text content for "text" input) will cause errors. Ensure all required fields are filled.
- Authentication Failures: Errors related to API credentials indicate misconfiguration or expired tokens. Verify and update the API key credential.
- API Request Failures: Network issues or service downtime can cause request failures. Check connectivity and service status.
Links and References
- n8n Expressions Documentation
- External service API documentation (referenced generically, as internal names are omitted)
- General TTS and podcast generation concepts for further understanding