PowerPoint to Text icon

PowerPoint to Text

Extract text from PowerPoint documents (.pptx) as { { slideNumber: text } }

Overview

This node extracts text content from PowerPoint presentation files (.pptx) provided as binary input. It reads each slide's XML data inside the .pptx archive and parses out all textual elements, returning a JSON object where each key is a slide number and its value is the concatenated text found on that slide.

Common scenarios include:

  • Automating extraction of slide notes or content for indexing, searching, or summarization.
  • Converting presentations into plain text for accessibility or further processing.
  • Integrating with workflows that require slide text analysis without manual copy-pasting.

Example: Given a .pptx file with 3 slides, the node outputs a JSON object like { "1": "Slide 1 text...", "2": "Slide 2 text...", "3": "Slide 3 text..." }.

Properties

Name Meaning
Binary Property Name of the binary property containing the .pptx file to extract text from (default: "data")

Output

The node outputs an array of items, each containing a json field with a slides object. This slides object maps slide numbers (as strings) to their extracted text content.

Example output structure:

{
  "slides": {
    "1": "Text content of slide 1",
    "2": "Text content of slide 2",
    "3": "Text content of slide 3"
  }
}

No binary output is produced by this node.

Dependencies

  • Requires the .pptx file to be provided as binary input under the specified binary property.
  • Uses the jszip library to unzip the .pptx archive.
  • Uses the fast-xml-parser library to parse slide XML files.
  • No external API keys or services are needed.
  • Ensure the node has access to the binary data in the workflow.

Troubleshooting

  • Error: Binary data "X" not found.
    This occurs if the specified binary property does not exist on the input item. Verify the binary property name matches exactly the one containing the .pptx file.

  • Empty or missing slide text output:
    The node relies on standard PowerPoint XML structure. If the .pptx file is corrupted or uses non-standard formatting, text extraction may fail or produce empty results.

  • Large files or many slides may increase execution time:
    Processing large presentations can be resource-intensive; consider splitting input or limiting slides if performance issues arise.

Links and References

Discussion