php-simple-html-dom-parser free download

Tesseract OCR

Open Source OCR Engine

...Tesseract can recognize over 100 languages out-of-the-box, and can be trained to recognize other languages. It supports various output formats, including plain text, HTML, PDF and more. It also has unicode (UTF-8) support.

Downloads: 3,105 This Week

Last Update: 2025-05-25

See Project

Tesseract.js

A pure Javascript Multilingual OCR

Tesseract.js is a pure Javascript port of the popular Tesseract OCR engine. Tesseract.js' library supports more than 100 languages, automatic text orientation and script detection, a simple interface for reading paragraph, word, and character bounding boxes. Tesseract.js can run either in a browser and on a server with NodeJS. Tesseract.js is a javascript library that gets words in almost any spoken language out of images. The main Tesseract.js functions (ex. recognize, detect) take an image parameter, which should be something that is like an image. ...

Downloads: 14 This Week

Last Update: 2025-04-07

See Project

Papermerge

Open Source Document Management System for Digital Archives

Papermerge is an open source document management system (DMS) primarily designed for archiving and retrieving your digital documents. Instead of having piles of paper documents all over your desk, office or drawers - you can quickly scan them and configure your scanner to directly upload to Papermerge DMS. Store, organize and index scanned documents in PDF, JPEG and TIFF formats. Instantly find relevant information using full text, tags and metadata-based search. Papermerge is free and...

Downloads: 14 This Week

Last Update: 2025-07-24

See Project

DeepDetect

Deep Learning API and Server in C++14 support for Caffe, PyTorch

...While the Open Source Deep Learning Server is the core element, with REST API, and multi-platform support that allows training & inference everywhere, the Deep Learning Platform allows higher level management for training neural network models and using them as if they were simple code snippets. Ready for applications of image tagging, object detection, segmentation, OCR, Audio, Video, Text classification, CSV for tabular data and time series. Neural network templates for the most effective architectures for GPU, CPU, and Embedded devices. Training in a few hours and with small data thanks to 25+ pre-trained models. ...

Downloads: 0 This Week

Last Update: 2025-07-19

See Project

AnyTXT Searcher

A Powerful Desktop Full-Text Search Engine, Just Like Local Google.

...You can quickly find any text in any file on your disk by Anytxt almost in 0.1 second. It works on Windows 11,10, 8, 7, Vista, XP, 2008, 2012, 2016,2022... AnyTXT Searcher supports the following file formats: Plain text (txt, cpp, py, html, etc.) Microsoft OneNote (one) Microsoft Word (doc, docx) Microsoft Excel (xls, xlsx) Microsoft PowerPoint (ppt, pptx) PDF WPS Office (wps, et, dps) EBook (epub, mobi, azw3, fb2 etc.) Mind Map Format (lighten, mmap, mm, xmind etc.) OFD .....

13 Reviews

Downloads: 4,170 This Week

Last Update: 2025-06-19

See Project

MyBox

Easy Tools of PDF, Image, File, Network, Data, and Medias

javafx-desktop-apps pdf image ocr icc barcode color-palette text bytes markdown html archive compress digest video audio editor converter media https://github.com/Mararsh/MyBox Self-contain packages need not java env nor installation. Jar packages need Java 16 or higher.

Downloads: 12 This Week

Last Update: 2025-10-02

See Project

DocWire SDK

Award-winning modern data processing SDK in C++20

DocWire SDK, a standout C++20AI driven data processing tool, has received award from SourceForge and strong backing from Microsoft. It handles nearly 100 file types, empowering efficient text extraction, web data extraction, and document analysis. For businesses, the shift to DocWire SDK signifies a leap forward. It promises comprehensive document format support and the ability to extract valuable insights from email boxes, databases, and websites using cutting-edge AI. DocWire SDK aims to...

Downloads: 8 This Week

Last Update: 2025-11-01

See Project

Super-PDF-Editor-Lite

World's most comprehensive, powerful, process-based PDF editor

World's most comprehensive, powerful, process-based and lighting fast PDF reader, editor and batch processor. Includes features like Create PDF from Images, HTML, Text files. Create a processing log file. Extract Page, Split Page, Rotate Page, Merge Page, Duplicate page, Move Page, Printing, and Compress Page. Improve image enhancement before OCR operation for better OCR performance. pdf Imposition, etc. Super PDF Editor is best for bulk pdf processing, especially for the printing industry. ...

3 Reviews

Downloads: 3 This Week

Last Update: 2023-02-02

See Project

OpenKM Document Management - DMS

Document Management System and Content Management System

...Thanks to its elegant and intuitive interface, OpenKM transforms complex operations into easy tasks. The most relevant functions of OpenKM is the indexing of the most common types of files: text, Office, Office 2007, OpenOffice, PDF, HTML, XML, MP3, JPEG, etc. For a complete feature list take a look at http://goo.gl/au8cQy

33 Reviews

Downloads: 778 This Week

Last Update: 2022-11-25

See Project

LayoutParser

A Unified Toolkit for Deep Learning Based Document Image Analysis

With the help of state-of-the-art deep learning models, Layout Parser enables extracting complicated document structures using only several lines of code. This method is also more robust and generalizable as no sophisticated rules are involved in this process. A complete instruction for installing the main Layout Parser library and auxiliary components. Learn how to load DL Layout models and use them for layout detection.

Downloads: 0 This Week

Last Update: 2022-08-04

See Project

gImageReader

A graphical frontend to tesseract-ocr

gImageReader is a simple Gtk/Qt front-end to tesseract. Features include: - Import PDF documents and images from disk, scanning devices, clipboard and screenshots - Process multiple images and documents in one go - Manual or automatic recognition area definition - Recognize to plain text or to hOCR documents - Recognized text displayed directly next to the image - Post-process the recognized text, including spellchecking - Generate PDF documents from hOCR documents **Note**: This page is only a mirror for the downloads. ...

27 Reviews

Downloads: 313 This Week

Last Update: 2022-01-28

See Project

Paperless-ng

A supercharged version of paperless, scan, index and archive docs

Paperless is a simple Django application running in two parts, a Consumer (the thing that does the indexing) and a Web server (the part that lets you search & download already-indexed documents). Paper is a nightmare. Environmental issues aside, there’s no excuse for it in the 21st century. It takes up space, collects dust, doesn’t support any form of a search feature, indexing is tedious, it’s heavy and prone to damage & loss.

Downloads: 0 This Week

Last Update: 2022-03-04

See Project

Manga Rikai OCR

...At the moment, the engine can capture and translate single text box, detect all text boxes in a page or as many pages as you want. Not only that, you can edit the text, save your progress, and even export your work as an HTML file. Got problems? Join our discord: https://discord.com/invite/BuNuanw

1 Review

Downloads: 15 This Week

Last Update: 2021-02-23

See Project

cuneiformplus

Fork of OCR software cuneiform

Fork of OCR software cuneiform Original software see: https://launchpad.net/cuneiform-linux by Cognitive Technologies and Jussi Pakkanen Other Open Source OCR stuff see * Tesseract by Ray Smith (using the Leptonica image library) * GOCR * OCRAD

Downloads: 4 This Week

Last Update: 2020-12-08

See Project

FormRead

Free OMR - OCR web sofware based on javascript and PHP

https://formread.org FormRead is a completely free OMR (optical mark recognition) web software for scanning and grading user-filled, multiple choice forms. Create your formats with any of your office or drawing tools, scan them and parameterize their coordinates in an easy way. Once you have parameterized your form, you can print many of them, give it to your students/respondents, scan and recognize them with formread, and you can finally export the data in your preferred formats...

Downloads: 11 This Week

Last Update: 2022-03-04

See Project

cbrTekStraktor

an application to automatically extract text from comic books.

...Its prime goal is to perform analysis on the texts of comic books. cbrTekStraktor can however also be used for scanlation or similar purposes. The application also enables to manually define text areas in CBR files. The application comprises a simple graphical editor for further processing the extracted text. The text extraction is achieved by a combination of statistical and graphical processing operations. It is based on the following 3 major algorithms - Binarization of color images (Niblak and other methods) - Connected components - K-Means clustering Apache Tesseract is used to perform Optical Character Recognition on the extracted text. ...

Downloads: 2 This Week

Last Update: 2017-06-14

See Project

DoAllWithPDF_servicemenu

KDE servicemenu for pdf

allows kde user to make a lot of things whit right click on a pdf file.

Downloads: 0 This Week

Last Update: 2016-05-26

See Project

MyOCR

Start Your Own Captcha Solving Business Portal

Captcha Solutions OCR Captcha Solver Reseller Website to Start Your Own Captcha Solving Business Portal

Downloads: 0 This Week

Last Update: 2016-03-16

See Project

WebDjVuTextEd

Edit the OCR text layer of DjVu documents in a web browser

WebDjVuTextEd allows to edit the text layer of OCR'ed DjVu documents in a web browser. You can modify the structure (paragraphs, lines, words...) create, delete, edit text nodes, modify their container box by mouse, and run a spellchecker. The program does not directly read the DjVu files, it requires exported XML text data and images. When using without a webserver, you can open and save local files, but cannot take advantages of auto-save and spell checking. Note that current SVN...

Downloads: 0 This Week

Last Update: 2015-11-21

See Project

edocias

Electronic Document Index And Search

EDocIAS (Electronic Document Index And Search) is a PHP-based tool for indexing and searching files of various types. Third-party tools (tesseract, xpdf, etc.) can be configured to support any type of file.

Downloads: 0 This Week

Last Update: 2015-07-10

See Project

ocr2data

Full OCR stack for document digitalization analisys and OCR that provide external conexion by API, standard document exchange formats and database.

Downloads: 0 This Week

Last Update: 2015-08-06

See Project

eBookFormatter

Got any emails with obnoxious inline text? Long text stories with bad formatting? Files that an OCR didn't quite translate right? RTF format files and no easy way to read or modify them? Then eBookFormatter is for you!

Downloads: 0 This Week

Last Update: 2013-03-12

See Project

Search Results for "php-simple-html-dom-parser"

Showing 22 open source projects for "php-simple-html-dom-parser"

Tesseract OCR

Tesseract.js

Papermerge

DeepDetect

AnyTXT Searcher

MyBox

DocWire SDK

Super-PDF-Editor-Lite

OpenKM Document Management - DMS

LayoutParser

gImageReader

Paperless-ng

Manga Rikai OCR

cuneiformplus

FormRead

cbrTekStraktor

DoAllWithPDF_servicemenu

MyOCR

WebDjVuTextEd

edocias

ocr2data

eBookFormatter

Search Results for "php-simple-html-dom-parser"

Showing 22 open source projects for "php-simple-html-dom-parser"

Tesseract OCR

Tesseract.js

Papermerge

DeepDetect

AnyTXT Searcher

MyBox

DocWire SDK

Super-PDF-Editor-Lite

OpenKM Document Management - DMS

LayoutParser

gImageReader

Paperless-ng

Manga Rikai OCR

cuneiformplus

FormRead

cbrTekStraktor

DoAllWithPDF_servicemenu

MyOCR

WebDjVuTextEd

edocias

ocr2data

eBookFormatter

Related Searches

Related Categories