kg-gen is an open-source framework developed by the STAIR Lab that automatically generates knowledge graphs from unstructured text using large language models. The system is designed to transform plain text sources such as documents, articles, or conversation transcripts into structured graphs composed of entities and relationships. Instead of relying on traditional rule-based extraction techniques, KG-Gen uses language models to identify entities and their relationships, producing higher-quality graph structures from raw text. The framework addresses common problems in automatic knowledge graph construction, particularly sparsity and duplication of entities, by applying a clustering and entity-resolution process that merges semantically similar nodes. This allows the generated graphs to be denser, more coherent, and easier to use for downstream tasks such as retrieval-augmented generation, semantic search, and reasoning systems.
Features
- LLM-powered pipeline that converts plain text into structured knowledge graphs
- Automatic extraction of entities and relationships from documents or conversations
- Entity clustering and resolution system that reduces duplicate or sparse graph nodes
- Multi-stage processing pipeline for aggregating information across multiple texts
- Python package distribution enabling easy integration into AI and data workflows
- Benchmark dataset and evaluation framework for measuring knowledge graph quality