CompatibilityOnly available on Node.js.
Setup
You’ll need to install major version3
of the node-llama-cpp module to communicate with your local model.
npm
node-llama-cpp
is tuned for running on a MacOS platform with support for the Metal GPU of Apple M-series of processors. If you need to turn this off or need support for the CUDA architecture then refer to the documentation at node-llama-cpp.
For advice on getting and preparing llama3
see the documentation for the LLM version of this module.
A note to LangChain.js contributors: if you want to run the tests associated with this module you will need to put the path to your local model in the environment variable LLAMA_PATH
.
Usage
Basic use
In this case we pass in a prompt wrapped as a message and expect a response.System messages
We can also provide a system message, note that with thellama_cpp
module a system message will cause the creation of a new session.
Chains
This module can also be used with chains, note that using more complex chains will require suitably powerful version ofllama3
such as the 70B version.
Streaming
We can also stream with Llama CPP, this can be using a raw ‘single prompt’ string:invoke
method, we can also achieve stream generation, and use signal
to abort the generation.
Related
- Chat model conceptual guide
- Chat model how-to guides