extract_layout(path)
The extract_layout()
function enables structured data extraction with layout information from a document.
Syntax
SELECT * from extract_layout(path => 'file_url')
Parameters
Parameter | Type | Optional | Description | Possible Values | Sample Value |
---|---|---|---|---|---|
path | String | No | The file path to extract text from | Any valid file URL | 'https://example.pdf' |
type | String | Yes | Type of file | Raw, PDF, Image | 'pdf' |
page_range | Array(Int) | Yes | Extra parameter for PDF file type for the range of page numbers | Array of Start and Ending page numbers | [1, 10] |
parallelism | Int | Yes | Extra parameter for PDF file type to process pages parallelly | 2, 4, 5 | 2 |
Usage
Extracting Layout information from a PDF
SELECT * FROM extract_layout(
path => 's3://sample-onlineboutique-codefiles/onlineboutique-codefiles/just-deserts-spring-obooko-small.pdf',
type=> 'pdf'
);
Extracting Layout information from an Image
SELECT * FROM extract_layout(
path => 'https://langdb-sample-data.s3.ap-southeast-1.amazonaws.com/Screenshot+from+2024-08-09+09-49-18.png',
type => 'image'
);