Skip to main content

transpose_parsed_tables(table_index, query)

The transpose_parsed_tables function takes the parsed information using extract_layout function, to extract tables as clickhouse tables.

Syntax

SELECT * FROM transpose_parsed_tables(table_index => 0, query)

Parameters

ParameterTypeOptionalDescriptionSample Value
table_indexIntYesTable Index to be extracted. 0 by defaulttable_index => 0
queryStringNoQuery to go over parsed document table(select * from pdf_blocks_billionaires)

Usage Example

Let's take the example we've been following along of Wikipedia page. Given that the page has the layout information extracted:

select * from transpose_parsed_tables(table_index => 0, (select * from pdf_blocks_billionaires))

The output would be the first table from the document as a clickhouse table.

| No. | Name                    | Net worth (USD) | Age | Nationality                       | Primary source(s) of wealth       |
|-----|-------------------------|-----------------|-----|-----------------------------------|-----------------------------------|
| | | | | | |
| 1 | Bernard Amault & family | $233 billion | 75 | France | LVMH |
| 2 - | Elon Musk | $195 billion | 52 | South Africa Canada United States | Tesla, SpaceX, Twitter (Currently |
| 3 | Jeff Bezos | $194 billion | 60 | United States | Amazon |
| 4 A | Mark Zuckerberg | $177 billion | 39 | United States | Meta Platforms |
| 5 | Larry Ellison | $141 billion | 79 | United States | Oracle Corporation |
| 6 | Warren Buffett | $133 billion | 93 | United States | Berkshire Hathaway |
| 7 | Bill Gates | $128 billion | 68 | United States | Microsoft |
| 8 A | Steve Ballmer | $121 billion | 68 | United States | Microsoft |
| 9 | Mukesh Ambani | $116 billion | 65 | India | Reliance Industries |
| 10 | Larry Page | $114 billion | 51 | United States | Google |