langchain-box
package provides two methods to index your files from Box: BoxLoader
and BoxBlobLoader
. BoxLoader
allows you to ingest text representations of files that have a text representation in Box. The BoxBlobLoader
allows you download the blob for any document or image file for processing with the blob parser of your choice.
This notebook details getting started with both of these. For detailed documentation of all BoxLoader features and configurations head to the API Reference pages for BoxLoader and BoxBlobLoader.
Overview
TheBoxLoader
class helps you get your unstructured content from Box in LangChain’s Document
format. You can do this with either a List[str]
containing Box file IDs, or with a str
containing a Box folder ID.
The BoxBlobLoader
class helps you get your unstructured content from Box in LangChain’s Blob
format. You can do this with a List[str]
containing Box file IDs, a str
containing a Box folder ID, a search query, or a BoxMetadataQuery
.
If getting files from a folder with folder ID, you can also set a Bool
to tell the loader to get all sub-folders in that folder, as well.
A Box instance can contain Petabytes of files, and folders can contain millions of files. Be intentional when choosing what folders you choose to index. And we recommend never getting all files from folder 0 recursively. Folder ID 0 is your root folder.
BoxLoader
will skip files without a text representation, while the BoxBlobLoader
will return blobs for all document and image files.
Integration details
Class | Package | Local | Serializable | JS support |
---|---|---|---|---|
BoxLoader | langchain_box | ✅ | ❌ | ❌ |
BoxBlobLoader | langchain_box | ✅ | ❌ | ❌ |
Loader features
Source | Document Lazy Loading | Async Support |
---|---|---|
BoxLoader | ✅ | ❌ |
BoxBlobLoader | ✅ | ❌ |
Setup
In order to use the Box package, you will need a few things:- A Box account — If you are not a current Box customer or want to test outside of your production Box instance, you can use a free developer account.
- A Box app — This is configured in the developer console, and for Box AI, must have the
Manage AI
scope enabled. Here you will also select your authentication method - The app must be enabled by the administrator. For free developer accounts, this is whomever signed up for the account.
Credentials
For these examples, we will use token authentication. This can be used with any authentication method. Just get the token with whatever methodology. If you want to learn more about how to use other authentication types withlangchain-box
, visit the Box provider document.
Installation
Install langchain_box.Initialization
Load files
If you wish to load files, you must provide theList
of file ids at instantiation time.
This requires 1 piece of information:
- box_file_ids (
List[str]
)- A list of Box file IDs.
BoxLoader
BoxBlobLoader
Load from folder
If you wish to load files from a folder, you must provide astr
with the Box folder ID at instantiation time.
This requires 1 piece of information:
- box_folder_id (
str
)- A string containing a Box folder ID.
BoxLoader
BoxBlobLoader
Search for files with BoxBlobLoader
If you need to search for files, theBoxBlobLoader
offers two methods. First you can perform a full text search with optional search options to narrow down that search.
This requires 1 piece of information:
- query (
str
)- A string containing the search query to perform.
BoxSearchOptions
object to narrow down that search
- box_search_options (
BoxSearchOptions
)
BoxBlobLoader search
- query (
str
)- A string containing the search query to perform.
BoxSearchOptions
object to narrow down that search
- box_search_options (
BoxSearchOptions
)
BoxBlobLoader Metadata query
Load
BoxLoader
BoxBlobLoader
Lazy Load
BoxLoader only
Extra fields
All Box connectors offer the ability to select additional fields from the BoxFileFull
object to return as custom LangChain metadata. Each object accepts an optional List[str]
called extra_fields
containing the json key from the return object, like extra_fields=["shared_link"]
.
The connector will add this field to the list of fields the integration needs to function and then add the results to the metadata returned in the Document
or Blob
, like "metadata" : { "source" : "source, "shared_link" : "shared_link" }
. If the field is unavailable for that file, it will be returned as an empty string, like "shared_link" : ""
.