clean-email

n8n node that cleans email for AI agents. Strips quoted replies, signatures, disclaimers, and HTML — returns LLM-ready text with token counts. Drag, drop, 73% fewer tokens.

Package Information

Downloads: 0 weekly / 23 monthly

Latest Version: 0.1.0

Available Nodes

Clean Email for LLM

Turn raw email into LLM-ready text. Strips quoted replies (Gmail, Outlook, Apple Mail, Yahoo), signatures (14 languages), helpdesk separators (Zendesk, Freshdesk, Intercom), disclaimers, and HTML. Returns clean text with token count and savings percentage. 73% average token reduction.

Documentation

n8n-nodes-clean-email

Turn raw email into LLM-ready text. 73% fewer tokens, zero config.

An n8n community node that strips quoted replies, signatures, disclaimers, and HTML from email — returning clean text your AI agent can actually think with.

The problem

A 10-email support thread is ~3,000 tokens. Your agent only needs ~400. The rest is quoted replies, signatures, legal disclaimers, and HTML tags. Every token costs money and burns context window.

What this node does

Drag it between your email trigger and your AI node. It outputs:

Field	Description
`clean_text`	Just the new content — no quotes, no signature, no HTML
`raw_text`	Original email (for fallback)
`original_tokens`	Token count before cleaning (cl100k_base)
`clean_tokens`	Token count after cleaning
`savings_pct`	Percentage of tokens saved
`confidence`	`high` or `low` — how certain the parser is

What it strips

Quoted replies — Gmail ("On DATE, NAME wrote:"), Outlook (From/Sent/To headers), Apple Mail, Yahoo, Thunderbird, nested quotes, forwarded messages

Signatures — Standard -- separator, mobile signatures in 14 languages (English, French, German, Spanish, Italian, Portuguese, Dutch, Japanese, Chinese, and more), common closings (Best regards, Thanks, Cheers, Cordialement, Mit freundlichen Grussen)

Helpdesk separators — Zendesk ("##- Please type your reply above this line"), Freshdesk ("--- Reply above this line ---"), Intercom reply markers

Notification footers — GitHub ("Reply to this email directly or view it on GitHub"), unsubscribe links, mailing list footers, copyright notices

Legal disclaimers — "CONFIDENTIAL", "DISCLAIMER", "This email and any attachments...", Exchange Online disclaimers

HTML — Strips all tags, converts block elements to newlines, removes <blockquote> content, decodes entities

Install

In your n8n instance:

Go to Settings > Community Nodes
Enter n8n-nodes-clean-email
Click Install

Or via CLI:

npm install n8n-nodes-clean-email

Usage

Add any email trigger (Gmail, Outlook, IMAP, webhook)
Add the Clean Email for LLM node
Set Email Text to {{ $json.text }} (or {{ $json.body }}, {{ $json.snippet }})
Connect to your AI node (OpenAI, Claude, Ollama, etc.)

The default expression {{ $json.text || $json.body || $json.snippet || "" }} auto-detects common email field names.

Example

Input (raw email, 158 tokens):

Thanks, that works!

On Mon, Mar 23, 2026 at 3:15 PM agent@company.com wrote:
> The invoice total is $4,200. Here's the breakdown:
> - Design: $2,000
> - Development: $2,200
>
> On Mon, Mar 23, 2026 at 2:45 PM Sarah Johnson wrote:
>> Can you check the invoice for Project Atlas?
>>
>> --
>> Sarah Johnson
>> Operations Manager, Acme Corp
>> Phone: (555) 123-4567
>> CONFIDENTIALITY NOTICE: This email and any attachments...

Output (5 tokens, 97% savings):

{
  "clean_text": "Thanks, that works!",
  "original_tokens": 158,
  "clean_tokens": 5,
  "savings_pct": 97,
  "confidence": "high"
}

Token counting

Uses OpenAI's cl100k_base tokenizer (via js-tiktoken) — the same encoding used by GPT-4, GPT-3.5, and Claude. Token counts are exact, not approximations.

Standalone use

The parsing engine works outside n8n too:

import { cleanEmailForLlm } from 'n8n-nodes-clean-email';

const result = cleanEmailForLlm(rawEmailText);
console.log(result.clean_text);    // cleaned content
console.log(result.savings_pct);   // e.g. 73

License

MIT

clean-emailInstall