Command-line toolset for extracting text from files (documents, images, archives) into SQLite with OCR support.
Simple, expandable, one shell script only.
Features
- Multi-format text extraction from 30+ file types including documents, spreadsheets, presentations, and archives
- OCR (Optical Character Recognition) support for extracting text from images and scanned documents
- Recursive archive processing - automatically extracts and processes files nested within ZIP, TAR, GZIP, and other archive formats
- SQLite database integration - stores all extracted text in a searchable SQLite database for fast queries
- Command-line interface - easy integration into scripts and automated workflows
- Batch processing - process entire directories with a single command
- Line-level granularity - extracts text with line numbers for precise referencing
- Configurable OCR - supports multiple languages and quality settings
- LibreOffice integration - uses headless LibreOffice for reliable document conversion
- Lightweight - shell script implementation with minimal dependencies
- Cross-platform - runs on Linux, macOS, and Windows (via WSL/Cygwin)
- Transaction-safe database updates - uses SQL transactions for data integrity
- Progress tracking - detailed output for monitoring extraction progress
- Error handling - continues processing even if individual files fail
Follow UniversalTextExtractor
Other Useful Business Software
AI-generated apps that pass security review
Retool lets you generate dashboards, admin panels, and workflows directly on your data. Type something like “Build me a revenue dashboard on my Stripe data” and get a working app with security, permissions, and compliance built in from day one. Whether on our cloud or self-hosted, create the internal software your team needs without compromising enterprise standards or control.
Rate This Project
Login To Rate This Project
User Reviews
Be the first to post a review of UniversalTextExtractor!