HTML Table Parse

Parse HTML tables into JSON format

Overview

This node parses HTML content to extract tables and convert them into JSON format. It is useful when you have raw HTML data containing one or more <table> elements and want to transform these tables into structured JSON objects for further processing or analysis.

Common scenarios include:

  • Extracting tabular data from web-scraped HTML pages.
  • Converting email or report HTML content with tables into JSON for automation workflows.
  • Normalizing multiple HTML tables into a consistent JSON structure for integration with other systems.

For example, if you receive an HTML string with several tables, this node can parse each table, optionally use the first row as column headers, clean the cell data, and output the tables either as separate JSON items or combined in various formats.

Properties

Name Meaning
HTML Source The HTML string containing the tables to parse.
Options Collection of options to customize parsing behavior:
- Use First Row as Header Whether the first row of each table should be used as the header (column names). Defaults to true.
- Output Format How parsed tables are structured in the output. Options:
- Separate Tables: Each table becomes a separate output item.
- Single Output: All tables combined into a single output item.
- List Format: Convert tables to a list of objects using column headers as keys.
- Raw Array: Return just the array of arrays without additional properties.
- Clean Data Whether to trim and normalize whitespace in table cell data. Defaults to true.

Output

The node outputs JSON data representing the parsed tables according to the selected output format:

  • Separate Tables: Each table is output as a separate item with json containing:
    {
      "tableIndex": <index_of_table>,
      "tableData": <array_of_rows_or_objects>
    }
    
  • Single Output: A single item with json containing all tables under the key tables:
    {
      "tables": [<table1_data>, <table2_data>, ...]
    }
    
  • List Format: Outputs each row of each table as a separate item with json as an object mapping column headers to cell values. If "Use First Row as Header" is disabled, it outputs an error message instead.
  • Raw Array: Outputs only the raw array of rows (arrays of strings) for the first table as a single item.

If binary data were involved, it would be summarized here, but this node only outputs JSON.

Dependencies

  • Uses the cheerio library to parse and traverse the HTML DOM.
  • No external API or service dependencies.
  • Requires the HTML source string input to contain valid HTML with <table> elements.

Troubleshooting

  • Empty or no tables found: Ensure the HTML source contains valid <table> tags.
  • Incorrect headers or missing keys in List Format: Make sure "Use First Row as Header" is enabled; otherwise, the node will return an error in the output.
  • Malformed HTML: Parsing may fail or produce unexpected results if the HTML is not well-formed.
  • Large HTML inputs: Performance might degrade with very large HTML strings or many tables.
  • Error messages: Errors during parsing are caught and returned as JSON with an error field if "Continue On Fail" is enabled; otherwise, they throw exceptions.

Links and References

Discussion