Skip to main content

extract_layout(path)

The extract_layout() function enables structured data extraction with layout information from a document.

Syntax

SELECT * from extract_layout(path => 'file_url')

Parameters

ParameterTypeOptionalDescriptionPossible ValuesSample Value
pathStringNoThe file path to extract text fromAny valid file URL'https://example.pdf'
typeStringYesType of fileRaw, PDF, Image'pdf'
page_rangeArray(Int)YesExtra parameter for PDF file type for the range of page numbersArray of Start and Ending page numbers[1, 10]
parallelismIntYesExtra parameter for PDF file type to process pages parallelly2, 4, 52

Usage

Extracting Layout information from a PDF

SELECT * FROM extract_layout(
path => 's3://sample-onlineboutique-codefiles/onlineboutique-codefiles/just-deserts-spring-obooko-small.pdf',
type=> 'pdf'
);

Extracting Layout information from an Image

SELECT * FROM extract_layout(
path => 'https://langdb-sample-data.s3.ap-southeast-1.amazonaws.com/Screenshot+from+2024-08-09+09-49-18.png',
type => 'image'
);