What is GraphRAG?
GraphRAG enhances traditional RAG by structuring knowledge as a graph. Instead of just retrieving text chunks, it understands relationships between entities - enabling complex queries that span multiple connected concepts.
- Entity relationships: Understand how concepts connect
- Multi-hop reasoning: Answer questions requiring multiple facts
- Global understanding: Summarize across entire document sets
- Structured retrieval: Query by relationship, not just similarity
GraphRAG vs Traditional RAG
Traditional RAG
Retrieves similar text chunks. Struggles with "What are all the relationships between X and Y?"
GraphRAG
Traverses knowledge graph. Excels at relational and summarization queries.
Building a Knowledge Graph
from langchain_openai import ChatOpenAI
from langchain_experimental.graph_transformers import LLMGraphTransformer
from langchain_community.graphs import Neo4jGraph
# Initialize graph database
graph = Neo4jGraph(
url="bolt://localhost:7687",
username="neo4j",
password="password"
)
# Create graph transformer
llm = ChatOpenAI(model="gpt-4", temperature=0)
transformer = LLMGraphTransformer(llm=llm)
# Extract entities and relationships from documents
documents = [...] # Your documents
graph_documents = transformer.convert_to_graph_documents(documents)
# Store in Neo4j
graph.add_graph_documents(graph_documents)
# Query the graph
result = graph.query("""
MATCH (p:Person)-[:WORKS_FOR]->(c:Company)
WHERE c.name = 'Acme Corp'
RETURN p.name, p.role
""")
Microsoft GraphRAG
# Install Microsoft GraphRAG
pip install graphrag
# Initialize project
python -m graphrag.index --init --root ./my_project
# Index documents
python -m graphrag.index --root ./my_project
# Query
python -m graphrag.query \
--root ./my_project \
--method global \
--query "What are the main themes in the documents?"
# Python usage
from graphrag.query.llm.oai.chat_openai import ChatOpenAI
from graphrag.query.structured_search.global_search.search import GlobalSearch
# Global search for high-level summarization
global_search = GlobalSearch(
llm=ChatOpenAI(model="gpt-4"),
context_builder=context_builder,
response_type="multiple paragraphs"
)
result = await global_search.asearch("What are the key findings?")
Neo4j + LangChain GraphRAG
from langchain_community.graphs import Neo4jGraph
from langchain.chains import GraphCypherQAChain
from langchain_openai import ChatOpenAI
# Connect to Neo4j
graph = Neo4jGraph(
url="bolt://localhost:7687",
username="neo4j",
password="password"
)
# Create QA chain that generates Cypher queries
chain = GraphCypherQAChain.from_llm(
llm=ChatOpenAI(model="gpt-4"),
graph=graph,
verbose=True,
return_intermediate_steps=True
)
# Natural language to graph query
result = chain.invoke({
"query": "Who are all the people connected to Project Alpha?"
})
print(result["result"])
# The LLM generates: MATCH (p:Person)-[:WORKS_ON]->(proj:Project {name: 'Project Alpha'}) RETURN p
Hybrid Vector + Graph Retrieval
class HybridGraphRAG:
def __init__(self):
self.vector_store = Chroma(...)
self.graph = Neo4jGraph(...)
self.llm = ChatOpenAI(model="gpt-4")
def retrieve(self, query: str) -> dict:
# 1. Vector search for relevant chunks
vector_results = self.vector_store.similarity_search(query, k=3)
# 2. Extract entities from query
entities = self.extract_entities(query)
# 3. Graph traversal for relationships
graph_results = []
for entity in entities:
neighbors = self.graph.query(f"""
MATCH (e {{name: '{entity}'}})-[r]-(connected)
RETURN e, type(r) as relation, connected
LIMIT 10
""")
graph_results.extend(neighbors)
return {
"text_context": [doc.page_content for doc in vector_results],
"graph_context": graph_results
}
def answer(self, query: str) -> str:
context = self.retrieve(query)
prompt = f"""Use both the text and graph context to answer.
Text Context:
{context['text_context']}
Graph Context (Entity Relationships):
{context['graph_context']}
Question: {query}"""
return self.llm.invoke(prompt).content
Use Cases
Enterprise Knowledge
Map organizational relationships, projects, and dependencies.
Research Analysis
Connect research papers, authors, and citations.
Customer 360
Unified view of customer interactions and history.
Compliance
Track regulatory relationships and requirements.
Best Practices
- Start with clear schema: Define entity types and relationships upfront
- Validate extractions: LLM entity extraction isn't perfect
- Combine approaches: Use vector + graph for best results
- Index appropriately: Create graph indexes for common queries
Master Advanced RAG Techniques
Our Agentic AI program covers GraphRAG and advanced retrieval patterns.
Explore Agentic AI Program