DrQA is an open-domain question answering system that reads large text corpora—famously Wikipedia—to answer natural language questions with extractive spans. It follows a two-stage pipeline: a fast document retriever first narrows down candidate articles, and a neural machine reader then predicts the exact answer span from those passages. The retriever relies on classic IR features (like TF-IDF and n-gram statistics) to remain lightweight and scalable to millions of documents. The reader is a neural model trained on supervised QA data to estimate start and end positions within a paragraph, and it can be adapted to new domains through fine-tuning or distant supervision. The repository includes scripts to build the Wikipedia index, train the reader, and evaluate end-to-end performance. DrQA popularized a practical recipe for combining IR and neural reading, and it remains a strong baseline for open-domain QA research and production prototypes.

Features

  • Scalable TF-IDF–based retriever over large corpora
  • Neural span extractor trained for precise start/end predictions
  • End-to-end pipeline from indexing to answering questions
  • Tools for distant supervision and domain adaptation
  • Reproducible training and evaluation scripts for standard datasets
  • Modular components enabling IR or reader swaps and custom corpora

Project Samples

Project Activity

See All Activity >

License

BSD License

Follow DrQA

DrQA Web Site

Other Useful Business Software
Try Google Cloud Risk-Free With $300 in Credit Icon
Try Google Cloud Risk-Free With $300 in Credit

No hidden charges. No surprise bills. Cancel anytime.

Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
Start Free
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of DrQA!

Additional Project Details

Operating Systems

Linux, Mac

Programming Language

Python

Related Categories

Python Artificial Intelligence Software

Registered

2025-10-07