Neo4j is a graph database management system developed by Neo4j, Inc
.
The data elementsNeo4j
stores are nodes, edges connecting them, and attributes of nodes and edges. Described by its developers as an ACID-compliant transactional database with native graph storage and processing,Neo4j
is available in a non-open-source “community edition” licensed with a modification of the GNU General Public License, with online backup and high availability extensions licensed under a closed-source commercial license. Neo also licensesNeo4j
with these extensions under closed-source commercial terms.
This notebook shows how to use LLMs to provide a natural language interface to a graph database you can query with the Cypher
query language.
Cypher is a declarative graph query language that allows for expressive and efficient data querying in a property graph.
Setting up
You will need to have a runningNeo4j
instance. One option is to create a free Neo4j database instance in their Aura cloud service. You can also run the database locally using the Neo4j Desktop application, or running a docker container.
You can run a local docker container by running the executing the following script:
Seeding the database
Assuming your database is empty, you can populate it using Cypher query language. The following Cypher statement is idempotent, which means the database information will be the same if you run it one or multiple times.Refresh graph schema information
If the schema of database changes, you can refresh the schema information needed to generate Cypher statements.Enhanced schema information
Choosing the enhanced schema version enables the system to automatically scan for example values within the databases and calculate some distribution metrics. For example, if a node property has less than 10 distinct values, we return all possible values in the schema. Otherwise, return only a single example value per node and relationship property.Querying the graph
We can now use the graph cypher QA chain to ask question of the graphLimit the number of results
You can limit the number of results from the Cypher QA Chain using thetop_k
parameter.
The default is 10.
Return intermediate results
You can return intermediate steps from the Cypher QA Chain using thereturn_intermediate_steps
parameter
Return direct results
You can return direct results from the Cypher QA Chain using thereturn_direct
parameter
Add examples in the Cypher generation prompt
You can define the Cypher statement you want the LLM to generate for particular questionsUse separate LLMs for Cypher and answer generation
You can use thecypher_llm
and qa_llm
parameters to define different llms
Ignore specified node and relationship types
You can useinclude_types
or exclude_types
to ignore parts of the graph schema when generating Cypher statements.
Validate generated Cypher statements
You can use thevalidate_cypher
parameter to validate and correct relationship directions in generated Cypher statements
Provide context from database results as tool/function output
You can use theuse_function_response
parameter to pass context from database results to an LLM as a tool/function output. This method improves the response accuracy and relevance of an answer as the LLM follows the provided context more closely.
You will need to use an LLM with native function calling support to use this feature.
function_response_system
to instruct the model on how to generate answers.
Note that qa_prompt
will have no effect when using use_function_response