Canopy is an open-source retrieval-augmented generation (RAG) framework developed by Pinecone to simplify the process of building applications that combine large language models with external knowledge sources. The system provides a complete pipeline for transforming raw text data into searchable embeddings, storing them in a vector database, and retrieving relevant context for language model responses. It is designed to handle many of the complex components required for a RAG workflow, including document chunking, embedding generation, prompt construction, and chat history management. Developers can use Canopy to quickly build chat systems that answer questions using their own data instead of relying solely on the pretrained knowledge of the language model. The framework includes a built-in server and command-line interface that allow users to experiment with RAG pipelines and compare outputs between retrieval-augmented responses and standard LLM responses.
Features
- End-to-end framework for building retrieval-augmented generation applications
- Automatic document chunking and embedding generation for knowledge bases
- Vector search integration using the Pinecone database
- Context retrieval engine that supplies relevant data to language models
- Command-line interface for testing RAG pipelines and comparing outputs
- Built-in server for deploying chat applications powered by custom data