Command-line toolset for extracting text from files (documents, images, archives) into SQLite with OCR support.
Simple, expandable, one shell script only.
Features
- Multi-format text extraction from 30+ file types including documents, spreadsheets, presentations, and archives
- OCR (Optical Character Recognition) support for extracting text from images and scanned documents
- Recursive archive processing - automatically extracts and processes files nested within ZIP, TAR, GZIP, and other archive formats
- SQLite database integration - stores all extracted text in a searchable SQLite database for fast queries
- Command-line interface - easy integration into scripts and automated workflows
- Batch processing - process entire directories with a single command
- Line-level granularity - extracts text with line numbers for precise referencing
- Configurable OCR - supports multiple languages and quality settings
- LibreOffice integration - uses headless LibreOffice for reliable document conversion
- Lightweight - shell script implementation with minimal dependencies
- Cross-platform - runs on Linux, macOS, and Windows (via WSL/Cygwin)
- Transaction-safe database updates - uses SQL transactions for data integrity
- Progress tracking - detailed output for monitoring extraction progress
- Error handling - continues processing even if individual files fail
Follow UniversalTextExtractor
Other Useful Business Software
AI-powered service management for IT and enterprise teams
Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.
Rate This Project
Login To Rate This Project
User Reviews
Be the first to post a review of UniversalTextExtractor!