PaSa is an open-source “paper search agent” built around large language models (LLMs), designed to automate the process of academic literature retrieval with human-like decision making. Instead of simply translating a query into keywords and returning a flat list of matching papers, PaSa uses a dual-agent architecture (Crawler + Selector) that can iteratively search, read, analyze, and filter academic publications — simulating how a researcher might dig through citation networks, expand references, and evaluate relevance based on both metadata and content. Given a complex scholarly question (for example, “Which works focus on non-stationary reinforcement learning with UCB-based value methods?”), PaSa decomposes the task: the Crawler generates search queries, retrieves candidate papers (via search tools and citation expansion), then adds them to a “paper queue.” The Selector then reads abstracts or full text (depending on what’s available) and decides which papers are relevant.

Features

  • Dual-agent architecture (Crawler + Selector) — enabling iterative search, citation expansion, and content-based selection rather than simple keyword matching
  • Reinforcement-learning-trained workflows (on synthetic + real query datasets) to optimize recall and precision for complex, nuance-heavy academic queries
  • Support for automatic citation network traversal: starting from initial hits, the agent can expand references to discover related relevant works beyond the first search result set
  • End-to-end pipeline: from query → search → paper retrieval → reading & evaluation → filtered results — minimizing manual intervention
  • Public datasets (AutoScholarQuery for training; RealScholarQuery for evaluation), open-source code and pretrained models — enabling reproducible research or custom fine-tuning
  • Benchmarked performance showing strong improvements over standard search engines and naive LLM-based searches in recall metrics for real-world academic queries

Project Samples

Project Activity

See All Activity >

License

Apache License V2.0

Follow PaSa

PaSa Web Site

Other Useful Business Software
Gen AI apps are built with MongoDB Atlas Icon
Gen AI apps are built with MongoDB Atlas

Build gen AI apps with an all-in-one modern database: MongoDB Atlas

MongoDB Atlas provides built-in vector search and a flexible document model so developers can build, scale, and run gen AI apps without stitching together multiple databases. From LLM integration to semantic search, Atlas simplifies your AI architecture—and it’s free to get started.
Start Free
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of PaSa!

Additional Project Details

Programming Language

Python

Related Categories

Python Artificial Intelligence Software

Registered

1 day ago