Package Information
Documentation
n8n-nodes-clean-email
Turn raw email into LLM-ready text. 73% fewer tokens, zero config.
An n8n community node that strips quoted replies, signatures, disclaimers, and HTML from email — returning clean text your AI agent can actually think with.
The problem
A 10-email support thread is ~3,000 tokens. Your agent only needs ~400. The rest is quoted replies, signatures, legal disclaimers, and HTML tags. Every token costs money and burns context window.
What this node does
Drag it between your email trigger and your AI node. It outputs:
| Field | Description |
|---|---|
clean_text |
Just the new content — no quotes, no signature, no HTML |
raw_text |
Original email (for fallback) |
original_tokens |
Token count before cleaning (cl100k_base) |
clean_tokens |
Token count after cleaning |
savings_pct |
Percentage of tokens saved |
confidence |
high or low — how certain the parser is |
What it strips
Quoted replies — Gmail ("On DATE, NAME wrote:"), Outlook (From/Sent/To headers), Apple Mail, Yahoo, Thunderbird, nested quotes, forwarded messages
Signatures — Standard -- separator, mobile signatures in 14 languages (English, French, German, Spanish, Italian, Portuguese, Dutch, Japanese, Chinese, and more), common closings (Best regards, Thanks, Cheers, Cordialement, Mit freundlichen Grussen)
Helpdesk separators — Zendesk ("##- Please type your reply above this line"), Freshdesk ("--- Reply above this line ---"), Intercom reply markers
Notification footers — GitHub ("Reply to this email directly or view it on GitHub"), unsubscribe links, mailing list footers, copyright notices
Legal disclaimers — "CONFIDENTIAL", "DISCLAIMER", "This email and any attachments...", Exchange Online disclaimers
HTML — Strips all tags, converts block elements to newlines, removes <blockquote> content, decodes entities
Install
In your n8n instance:
- Go to Settings > Community Nodes
- Enter
n8n-nodes-clean-email - Click Install
Or via CLI:
npm install n8n-nodes-clean-email
Usage
- Add any email trigger (Gmail, Outlook, IMAP, webhook)
- Add the Clean Email for LLM node
- Set Email Text to
{{ $json.text }}(or{{ $json.body }},{{ $json.snippet }}) - Connect to your AI node (OpenAI, Claude, Ollama, etc.)
The default expression {{ $json.text || $json.body || $json.snippet || "" }} auto-detects common email field names.
Example
Input (raw email, 158 tokens):
Thanks, that works!
On Mon, Mar 23, 2026 at 3:15 PM agent@company.com wrote:
> The invoice total is $4,200. Here's the breakdown:
> - Design: $2,000
> - Development: $2,200
>
> On Mon, Mar 23, 2026 at 2:45 PM Sarah Johnson wrote:
>> Can you check the invoice for Project Atlas?
>>
>> --
>> Sarah Johnson
>> Operations Manager, Acme Corp
>> Phone: (555) 123-4567
>> CONFIDENTIALITY NOTICE: This email and any attachments...
Output (5 tokens, 97% savings):
{
"clean_text": "Thanks, that works!",
"original_tokens": 158,
"clean_tokens": 5,
"savings_pct": 97,
"confidence": "high"
}
Token counting
Uses OpenAI's cl100k_base tokenizer (via js-tiktoken) — the same encoding used by GPT-4, GPT-3.5, and Claude. Token counts are exact, not approximations.
Standalone use
The parsing engine works outside n8n too:
import { cleanEmailForLlm } from 'n8n-nodes-clean-email';
const result = cleanEmailForLlm(rawEmailText);
console.log(result.clean_text); // cleaned content
console.log(result.savings_pct); // e.g. 73
License
MIT