php-simple-html-dom-parser free download

Tesseract OCR

Open Source OCR Engine

...Tesseract can recognize over 100 languages out-of-the-box, and can be trained to recognize other languages. It supports various output formats, including plain text, HTML, PDF and more. It also has unicode (UTF-8) support.

Downloads: 3,072 This Week

Last Update: 2025-05-25

See Project

Tesseract.js

A pure Javascript Multilingual OCR

Tesseract.js is a pure Javascript port of the popular Tesseract OCR engine. Tesseract.js' library supports more than 100 languages, automatic text orientation and script detection, a simple interface for reading paragraph, word, and character bounding boxes. Tesseract.js can run either in a browser and on a server with NodeJS. Tesseract.js is a javascript library that gets words in almost any spoken language out of images. The main Tesseract.js functions (ex. recognize, detect) take an image parameter, which should be something that is like an image. ...

Downloads: 14 This Week

Last Update: 2025-04-07

See Project

Papermerge

Open Source Document Management System for Digital Archives

Papermerge is an open source document management system (DMS) primarily designed for archiving and retrieving your digital documents. Instead of having piles of paper documents all over your desk, office or drawers - you can quickly scan them and configure your scanner to directly upload to Papermerge DMS. Store, organize and index scanned documents in PDF, JPEG and TIFF formats. Instantly find relevant information using full text, tags and metadata-based search. Papermerge is free and...

Downloads: 12 This Week

Last Update: 2025-07-24

See Project

DeepDetect

Deep Learning API and Server in C++14 support for Caffe, PyTorch

...While the Open Source Deep Learning Server is the core element, with REST API, and multi-platform support that allows training & inference everywhere, the Deep Learning Platform allows higher level management for training neural network models and using them as if they were simple code snippets. Ready for applications of image tagging, object detection, segmentation, OCR, Audio, Video, Text classification, CSV for tabular data and time series. Neural network templates for the most effective architectures for GPU, CPU, and Embedded devices. Training in a few hours and with small data thanks to 25+ pre-trained models. ...

Downloads: 0 This Week

Last Update: 2025-07-19

See Project

AnyTXT Searcher

A Powerful Desktop Full-Text Search Engine, Just Like Local Google.

...You can quickly find any text in any file on your disk by Anytxt almost in 0.1 second. It works on Windows 11,10, 8, 7, Vista, XP, 2008, 2012, 2016,2022... AnyTXT Searcher supports the following file formats: Plain text (txt, cpp, py, html, etc.) Microsoft OneNote (one) Microsoft Word (doc, docx) Microsoft Excel (xls, xlsx) Microsoft PowerPoint (ppt, pptx) PDF WPS Office (wps, et, dps) EBook (epub, mobi, azw3, fb2 etc.) Mind Map Format (lighten, mmap, mm, xmind etc.) OFD .....

13 Reviews

Downloads: 4,137 This Week

Last Update: 2025-06-19

See Project

MyBox

Easy Tools of PDF, Image, File, Network, Data, and Medias

javafx-desktop-apps pdf image ocr icc barcode color-palette text bytes markdown html archive compress digest video audio editor converter media https://github.com/Mararsh/MyBox Self-contain packages need not java env nor installation. Jar packages need Java 16 or higher.

Downloads: 13 This Week

Last Update: 2025-10-02

See Project

AvantFAX

Multiuser HylaFAX PHP/MySQL Web interface for viewing faxes online, downloading & emailing in PDF format, and categorizing & archiving all sent and received faxes.

10 Reviews

Downloads: 8 This Week

Last Update: 2025-04-10

See Project

DocWire SDK

Award-winning modern data processing SDK in C++20

DocWire SDK, a standout C++20AI driven data processing tool, has received award from SourceForge and strong backing from Microsoft. It handles nearly 100 file types, empowering efficient text extraction, web data extraction, and document analysis. For businesses, the shift to DocWire SDK signifies a leap forward. It promises comprehensive document format support and the ability to extract valuable insights from email boxes, databases, and websites using cutting-edge AI. DocWire SDK aims to...

Downloads: 6 This Week

Last Update: 2025-11-01

See Project

Super-PDF-Editor-Lite

World's most comprehensive, powerful, process-based PDF editor

World's most comprehensive, powerful, process-based and lighting fast PDF reader, editor and batch processor. Includes features like Create PDF from Images, HTML, Text files. Create a processing log file. Extract Page, Split Page, Rotate Page, Merge Page, Duplicate page, Move Page, Printing, and Compress Page. Improve image enhancement before OCR operation for better OCR performance. pdf Imposition, etc. Super PDF Editor is best for bulk pdf processing, especially for the printing industry. ...

3 Reviews

Downloads: 5 This Week

Last Update: 2023-02-02

See Project

OpenKM Document Management - DMS

Document Management System and Content Management System

...Thanks to its elegant and intuitive interface, OpenKM transforms complex operations into easy tasks. The most relevant functions of OpenKM is the indexing of the most common types of files: text, Office, Office 2007, OpenOffice, PDF, HTML, XML, MP3, JPEG, etc. For a complete feature list take a look at http://goo.gl/au8cQy

33 Reviews

Downloads: 777 This Week

Last Update: 2022-11-25

See Project

LayoutParser

A Unified Toolkit for Deep Learning Based Document Image Analysis

With the help of state-of-the-art deep learning models, Layout Parser enables extracting complicated document structures using only several lines of code. This method is also more robust and generalizable as no sophisticated rules are involved in this process. A complete instruction for installing the main Layout Parser library and auxiliary components. Learn how to load DL Layout models and use them for layout detection.

Downloads: 0 This Week

Last Update: 2022-08-04

See Project

e-Dokyumento

e-Dokyumento is web-based Document Management System (DMS)

e-Dokyumento is opensource web-based Document Management System (DMS) A Document Management which automates the basic office document workflow such as receiving, filing, routing, and approving through capturing (scanning), digitizing (OCR Reading), storing, tagging, and electronically routing and approving (e-signature) of electronic documents. # Demo : https://e-dokyumento.herokuapp.com/ https://edokyu.seillig.com/ (refer to Readme.md for the...

2 Reviews

Downloads: 11 This Week

Last Update: 2022-05-14

See Project

Ozyr

Ozyr is a simple and easy to use OCR snipping tool

Ozyr is a simple and easy to use OCR snipping tool to get text from images so you can copy and edit it. Source Code: https://github.com/PETEROLO291/Ozyr Installer: 117MB Program: 524MB Version: 1.0

1 Review

Downloads: 0 This Week

Last Update: 2022-04-13

See Project

gImageReader

A graphical frontend to tesseract-ocr

gImageReader is a simple Gtk/Qt front-end to tesseract. Features include: - Import PDF documents and images from disk, scanning devices, clipboard and screenshots - Process multiple images and documents in one go - Manual or automatic recognition area definition - Recognize to plain text or to hOCR documents - Recognized text displayed directly next to the image - Post-process the recognized text, including spellchecking - Generate PDF documents from hOCR documents **Note**: This page is only a mirror for the downloads. ...

27 Reviews

Downloads: 301 This Week

Last Update: 2022-01-28

See Project

Paperless-ng

A supercharged version of paperless, scan, index and archive docs

Paperless is a simple Django application running in two parts, a Consumer (the thing that does the indexing) and a Web server (the part that lets you search & download already-indexed documents). Paper is a nightmare. Environmental issues aside, there’s no excuse for it in the 21st century. It takes up space, collects dust, doesn’t support any form of a search feature, indexing is tedious, it’s heavy and prone to damage & loss.

Downloads: 1 This Week

Last Update: 2022-03-04

See Project

Manga Rikai OCR

...At the moment, the engine can capture and translate single text box, detect all text boxes in a page or as many pages as you want. Not only that, you can edit the text, save your progress, and even export your work as an HTML file. Got problems? Join our discord: https://discord.com/invite/BuNuanw

1 Review

Downloads: 17 This Week

Last Update: 2021-02-23

See Project

cuneiformplus

Fork of OCR software cuneiform

Fork of OCR software cuneiform Original software see: https://launchpad.net/cuneiform-linux by Cognitive Technologies and Jussi Pakkanen Other Open Source OCR stuff see * Tesseract by Ray Smith (using the Leptonica image library) * GOCR * OCRAD

Downloads: 5 This Week

Last Update: 2020-12-08

See Project

FormRead

Free OMR - OCR web sofware based on javascript and PHP

https://formread.org FormRead is a completely free OMR (optical mark recognition) web software for scanning and grading user-filled, multiple choice forms. Create your formats with any of your office or drawing tools, scan them and parameterize their coordinates in an easy way. Once you have parameterized your form, you can print many of them, give it to your students/respondents, scan and recognize them with formread, and you can finally export the data in your preferred formats...

Downloads: 10 This Week

Last Update: 2022-03-04

See Project

cbrTekStraktor

an application to automatically extract text from comic books.

...Its prime goal is to perform analysis on the texts of comic books. cbrTekStraktor can however also be used for scanlation or similar purposes. The application also enables to manually define text areas in CBR files. The application comprises a simple graphical editor for further processing the extracted text. The text extraction is achieved by a combination of statistical and graphical processing operations. It is based on the following 3 major algorithms - Binarization of color images (Niblak and other methods) - Connected components - K-Means clustering Apache Tesseract is used to perform Optical Character Recognition on the extracted text. ...

Downloads: 1 This Week

Last Update: 2017-06-14

See Project

HRCloud2

A full-featured home hosted Cloud Drive, Personal Assistant, App Launc

HRCloud2 - A fully featured home-hosted Cloud drive and personal assistant that allows users to create and manage user accounts for friends and family, access files from anywhere, convert files and archives to other formats, perform simple image resizing and editing, stream media, create playlists, search for files, OCR images and PDF's, share files with friends and more! Building off the observed industry standards for commercial Cloud storage, HRCloud2 protects server permission levels, hashes sensitive data, enforces API security policies, and can even scan itself and it's controlled directories with ClamAV for security. ...

Downloads: 0 This Week

Last Update: 2016-12-27

See Project

DoAllWithPDF_servicemenu

KDE servicemenu for pdf

allows kde user to make a lot of things whit right click on a pdf file.

Downloads: 0 This Week

Last Update: 2016-05-26

See Project

MyOCR

Start Your Own Captcha Solving Business Portal

Captcha Solutions OCR Captcha Solver Reseller Website to Start Your Own Captcha Solving Business Portal

Downloads: 0 This Week

Last Update: 2016-03-16

See Project

WebDjVuTextEd

Edit the OCR text layer of DjVu documents in a web browser

WebDjVuTextEd allows to edit the text layer of OCR'ed DjVu documents in a web browser. You can modify the structure (paragraphs, lines, words...) create, delete, edit text nodes, modify their container box by mouse, and run a spellchecker. The program does not directly read the DjVu files, it requires exported XML text data and images. When using without a webserver, you can open and save local files, but cannot take advantages of auto-save and spell checking. Note that current SVN...

Downloads: 0 This Week

Last Update: 2015-11-21

See Project

phpSANE

Web-Based Frontend for SANE

phpSANE is a web-based frontend for SANE written in HTML/PHP so you can scan with your web-browser. It also supports OCR.

13 Reviews

Downloads: 2 This Week

Last Update: 2013-10-24

See Project

Tesseract-gui

Tessract-GUI is not a front-end for tesseract-ocr. It is just a graphical way to use it with simple image manipulation thru ImageMagick.

2 Reviews

Downloads: 20 This Week

Last Update: 2014-06-29

See Project

Search Results for "php-simple-html-dom-parser"

Showing 32 open source projects for "php-simple-html-dom-parser"

Tesseract OCR

Tesseract.js

Papermerge

DeepDetect

AnyTXT Searcher

MyBox

AvantFAX

DocWire SDK

Super-PDF-Editor-Lite

OpenKM Document Management - DMS

LayoutParser

e-Dokyumento

Ozyr

gImageReader

Paperless-ng

Manga Rikai OCR

cuneiformplus

FormRead

cbrTekStraktor

HRCloud2

DoAllWithPDF_servicemenu

MyOCR

WebDjVuTextEd

phpSANE

Tesseract-gui

Search Results for "php-simple-html-dom-parser"

Showing 32 open source projects for "php-simple-html-dom-parser"

Tesseract OCR

Tesseract.js

Papermerge

DeepDetect

AnyTXT Searcher

MyBox

AvantFAX

DocWire SDK

Super-PDF-Editor-Lite

OpenKM Document Management - DMS

LayoutParser

e-Dokyumento

Ozyr

gImageReader

Paperless-ng

Manga Rikai OCR

cuneiformplus

FormRead

cbrTekStraktor

HRCloud2

DoAllWithPDF_servicemenu

MyOCR

WebDjVuTextEd

phpSANE

Tesseract-gui

Related Searches

Related Categories