PDFParser
document loader.
WRITER’s PDF Parser converts PDF documents into other formats like text or Markdown. This is particularly useful when you need to extract and process text content from PDF files for further analysis or integration into your workflow. In langchain-writer
, we provide usage of WRITER’s PDF Parser as a LangChain document parser.
Overview
Integration details
Class | Package | Local | Serializable | JS support | Downloads | Version |
---|---|---|---|---|---|---|
PDFParser | langchain-writer | ❌ | ❌ | ❌ |
Setup
ThePDFParser
is available in the langchain-writer
package:
Credentials
Sign up for WRITER AI Studio to generate an API key (you can follow this Quickstart). Then, set the WRITER_API_KEY environment variable:LANGSMITH_TRACING
and LANGSMITH_API_KEY
environment variables:
Instantiation
Next, instantiate an instance of the WRITER PDF Parser with the desired output format:Usage
There are two ways to use the PDF Parser, either synchronously or asynchronously. In either case, the PDF Parser will return a list ofDocument
objects, each containing the parsed content of a page from the PDF file.
Synchronous usage
To invoke the PDF Parser synchronously, pass aBlob
object to the parse
method referencing the PDF file you want to parse:
Asynchronous usage
To invoke the PDF Parser asynchronously, pass aBlob
object to the aparse
method referencing the PDF file you want to parse:
API reference
For detailed documentation of allPDFParser
features and configurations, head to the API reference.