DrQA is an open-domain question answering system that reads large text corpora—famously Wikipedia—to answer natural language questions with extractive spans. It follows a two-stage pipeline: a fast document retriever first narrows down candidate articles, and a neural machine reader then predicts the exact answer span from those passages. The retriever relies on classic IR features (like TF-IDF and n-gram statistics) to remain lightweight and scalable to millions of documents. The reader is a neural model trained on supervised QA data to estimate start and end positions within a paragraph, and it can be adapted to new domains through fine-tuning or distant supervision. The repository includes scripts to build the Wikipedia index, train the reader, and evaluate end-to-end performance. DrQA popularized a practical recipe for combining IR and neural reading, and it remains a strong baseline for open-domain QA research and production prototypes.

Features

  • Scalable TF-IDF–based retriever over large corpora
  • Neural span extractor trained for precise start/end predictions
  • End-to-end pipeline from indexing to answering questions
  • Tools for distant supervision and domain adaptation
  • Reproducible training and evaluation scripts for standard datasets
  • Modular components enabling IR or reader swaps and custom corpora

Project Samples

Project Activity

See All Activity >

License

BSD License

Follow DrQA

DrQA Web Site

Other Useful Business Software
AI-powered service management for IT and enterprise teams Icon
AI-powered service management for IT and enterprise teams

Enterprise-grade ITSM, for every business

Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.
Try it Free
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of DrQA!

Additional Project Details

Operating Systems

Linux, Mac

Programming Language

Python

Related Categories

Python Artificial Intelligence Software

Registered

2025-10-07