clean-email

n8n node that cleans email for AI agents. Strips quoted replies, signatures, disclaimers, and HTML — returns LLM-ready text with token counts. Drag, drop, 73% fewer tokens.

Documentation

n8n-nodes-clean-email

Turn raw email into LLM-ready text. 73% fewer tokens, zero config.

An n8n community node that strips quoted replies, signatures, disclaimers, and HTML from email — returning clean text your AI agent can actually think with.

The problem

A 10-email support thread is ~3,000 tokens. Your agent only needs ~400. The rest is quoted replies, signatures, legal disclaimers, and HTML tags. Every token costs money and burns context window.

What this node does

Drag it between your email trigger and your AI node. It outputs:

Field Description
clean_text Just the new content — no quotes, no signature, no HTML
raw_text Original email (for fallback)
original_tokens Token count before cleaning (cl100k_base)
clean_tokens Token count after cleaning
savings_pct Percentage of tokens saved
confidence high or low — how certain the parser is

What it strips

Quoted replies — Gmail ("On DATE, NAME wrote:"), Outlook (From/Sent/To headers), Apple Mail, Yahoo, Thunderbird, nested quotes, forwarded messages

Signatures — Standard -- separator, mobile signatures in 14 languages (English, French, German, Spanish, Italian, Portuguese, Dutch, Japanese, Chinese, and more), common closings (Best regards, Thanks, Cheers, Cordialement, Mit freundlichen Grussen)

Helpdesk separators — Zendesk ("##- Please type your reply above this line"), Freshdesk ("--- Reply above this line ---"), Intercom reply markers

Notification footers — GitHub ("Reply to this email directly or view it on GitHub"), unsubscribe links, mailing list footers, copyright notices

Legal disclaimers — "CONFIDENTIAL", "DISCLAIMER", "This email and any attachments...", Exchange Online disclaimers

HTML — Strips all tags, converts block elements to newlines, removes <blockquote> content, decodes entities

Install

In your n8n instance:

  1. Go to Settings > Community Nodes
  2. Enter n8n-nodes-clean-email
  3. Click Install

Or via CLI:

npm install n8n-nodes-clean-email

Usage

  1. Add any email trigger (Gmail, Outlook, IMAP, webhook)
  2. Add the Clean Email for LLM node
  3. Set Email Text to {{ $json.text }} (or {{ $json.body }}, {{ $json.snippet }})
  4. Connect to your AI node (OpenAI, Claude, Ollama, etc.)

The default expression {{ $json.text || $json.body || $json.snippet || "" }} auto-detects common email field names.

Example

Input (raw email, 158 tokens):

Thanks, that works!

On Mon, Mar 23, 2026 at 3:15 PM agent@company.com wrote:
> The invoice total is $4,200. Here's the breakdown:
> - Design: $2,000
> - Development: $2,200
>
> On Mon, Mar 23, 2026 at 2:45 PM Sarah Johnson wrote:
>> Can you check the invoice for Project Atlas?
>>
>> --
>> Sarah Johnson
>> Operations Manager, Acme Corp
>> Phone: (555) 123-4567
>> CONFIDENTIALITY NOTICE: This email and any attachments...

Output (5 tokens, 97% savings):

{
  "clean_text": "Thanks, that works!",
  "original_tokens": 158,
  "clean_tokens": 5,
  "savings_pct": 97,
  "confidence": "high"
}

Token counting

Uses OpenAI's cl100k_base tokenizer (via js-tiktoken) — the same encoding used by GPT-4, GPT-3.5, and Claude. Token counts are exact, not approximations.

Standalone use

The parsing engine works outside n8n too:

import { cleanEmailForLlm } from 'n8n-nodes-clean-email';

const result = cleanEmailForLlm(rawEmailText);
console.log(result.clean_text);    // cleaned content
console.log(result.savings_pct);   // e.g. 73

License

MIT

Discussion