Search Results for "java ocr extraction text" - Page 2

Showing 73 open source projects for "java ocr extraction text"

View related business solutions
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build, govern, and optimize agents and models with Gemini Enterprise Agent Platform.
    Start Free
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 1

    pdi-jira

    JIRA plugin for Pentaho Data Integration

    Using this PDI plugin you can connect any JIRA service even using SSL connection and perform JSON data extraction over the results. JQL is used to obtain data from the JIRA remote service.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2

    cbrTekStraktor

    an application to automatically extract text from comic books.

    cbrTekStraktor is an application to automatically extract text from the text bubbles or speech balloons present in comic book reader files (CBR). Its prime goal is to perform analysis on the texts of comic books. cbrTekStraktor can however also be used for scanlation or similar purposes. The application also enables to manually define text areas in CBR files. The application comprises a simple graphical editor for further processing the extracted text. The text extraction is...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 3

    Musaheb

    An Arabic collocation extraction tool

    “Musaheb”, an Arabic collocation extraction tool that has been designed and implemented to overcome the limitations of existing collocation extraction tools. “Musaheb” is able to extract n-gram collocations up to 5-gram, in addition to extracting the collocates of the nodes (the word-types we are looking for its collocates) within a window size of zero to 15 words. Moreover, it provides eight collocation statistics to calculate the strength of the collocation, and permits the input of...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4

    Adele

    Adhoc Data Exploration - Live & Easy

    Adele was developed to simplify the daily work with data. Use it as a swiss knife to fill the gap between your work with spreadsheet application like MS Excel and enterprise servers like SAP ERP. Specialized tools like Rapid Miner, KNIME or similiary stuff should not be replaced. But Adele is designed for business people working with spreadsheet applications to analyse their data. There are many technical concepts in an easier way included. For example realtime OLAP, transformations,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 5
    Ansj Chinese word segmentation

    Ansj Chinese word segmentation

    Ansj word segmentation

    The real java implementation of ict. The word segmentation effect is faster than the open source version of ict. Chinese word segmentation, name recognition, part-of-speech tagging, user-defined dictionary. This is a java implementation of Chinese word segmentation based on n-Gram+CRF+HMM. The word segmentation speed reaches about 2 million words per second (tested under mac air), and the accuracy rate can reach more than 96%. At present, it has realized the functions of Chinese word...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    PDF Clown

    PDF Clown

    General-Purpose PDF Library for Java and .NET

    PDF Clown is a general-purpose Java and .NET library for manipulating PDF files through multiple abstraction layers, rigorously adhering to PDF 1.7 specification (ISO 32000-1). This project aims to provide a universal access to PDF files (creation, reading, editing, rendering...) through an accurate and elegant object-oriented API. * Features: http://pdfclown.org/overview/features/ * Overview: http://pdfclown.org/overview/architecture/ * Website: http://pdfclown.org/ * Blog:...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7

    Personalized Search Engine

    Personalized Search Engine for Your Files

    MySearchEngine (Personalized Search Engine) is a Java software to search files and folders in an OS file system. It differs from general OS file search engines in that it personalizes the indexing setup so that users can choose which directories to index or remove from an existing index and it can also suggest queries just like Google's "Did you mean" feature. The customization of indexing and query suggestion greatly improves search speed and make user experience more comfortable. eLibrary...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8

    eLibrary

    Personalized Search Engine for Commonly Used Files

    eLibrary (electric library) is a Java software to search files and folders in an OS file system. It differs from general OS file search engines in that it personalizes the indexing setup so that users can choose which directories to index or remove from an existing index and it can also suggest queries just like Google's "Did you mean" feature. The customization of indexing and query suggestion greatly improves search speed and make user experience more comfortable. eLibrary can also extract...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    OCR For Visually Challenged Person

    OCR For Visually Challenged Person

    Provides GUI for Tessaract OCR

    It converts scanned image into text, braille and audio format. The image should be scanned with atleast 300 dpi for better accuracy.
    Downloads: 5 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 10
    DJVU++

    DJVU++

    The DjVu complete solution,with OCR Technology(Arabic ,English).

    DjVu++ is a user-friendly program that used to manipulate DjVu file formats such as eBooks with a penalty of editing features. The program introduce a free replacement for the property PDF format with similar resolution and smaller file size DjVu++ also support OCR to handle text in scanned books and images. The program shows good performance for English. In addition to the Arabic language to lead free and commercial software in this area. The main features of DjVu++ program are: o...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 11
    ePUBator

    ePUBator

    Minimal offline PDF to ePUB converter for Android

    Minimal offline PDF to ePUB converter for Android - ©2011 Ezio Querini ePUBator extract text from a PDF file and put it in a well formed (epubcheck compliant) ePUB file. PDF extraction based on iText library <http://itextpdf.com/> released under the AGPL license. - ePUBator IS THINKED FOR BOOKS (NOT FOR EVERY TYPE OF PDF), BUT IF YOU NEED A BETTER RESULT TRY SOMETHING ELSE LIKE CALIBRE. - ePUBator doesn't need internet connection (doesn't send your docs somewhere on the net,...
    Leader badge
    Downloads: 23 This Week
    Last Update:
    See Project
  • 12
    OpenSearchServer Extractor

    OpenSearchServer Extractor

    A RESTFul/JSON Web Service for text and metata extraction

    An open source RESTFul Web Service for text , meta-data extraction and analysis. oss-text-extractor supports various binary formats: Word processor (doc, docx, odt, rtf) Spreadsheet (xls, xlsx, ods) Presentation (ppt, pptx, odp) Publishing (pdf, pub) Web (rss, html/xhtml) Medias (audio, images) Others (vsd, text)
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Vision2u

    Vision2u

    free image processing software

    Vision2u offers a free image processing software for personal use and research. Primary tasks of the image processing can be realized during simple operation of the software. Every Web cam owner can have simplest measuring, counting or tasks of monitoring done without high capital outlays.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Eye is an experimental OCR (image-to-text) application.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 15
    FALCON - Text Search Java Project

    FALCON - Text Search Java Project

    JSON based text search Java Project

    ----------------- - What is it? - ----------------- The "Falcon Search" is a JAVA API and tool to search inside the documents. It was originally started to search the content in pdf files under the project "HAWK Search". Searching with this tool is query-based not word-based as in most of the document search tools OR document readers. It also takes care of jumbling of words within query and spelling mistakes. Commonly used techniques in this project are Natural Language...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16

    Detexter

    Detexter is an app designed to extract text from PDF files.

    Detexter lets you extract text from multiple PDF files. Detexter uses the PDFBox library for its text extraction.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17

    TML - Text Mining Library for LSA & CMM

    TML is a Java Library for LSA and extracting Concept Maps from text

    TML has moved to http://www.villalon.cl/tml.html and the code to https://github.com/villalon/tml
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18

    TextProcessor

    A Java package to preprocess text datasets for posterior text analysis

    The TextProcessor Java package is a text processing toolkit, which provides some frequently used text processing functions such as stemming, removing stop-words, generating a term vocabulary, and calculating the term-doc frequency matrix. Basic topic mining models such as LDA and sparse NMF are also supported. The package can also generate feature files from a given text dataset with LDA and LIBSVM format for posterior procedures such as classification or clustering. The toolkit is also...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    LynxSight Mobile

    LynxSight Mobile

    An OCR assistant for visually impaired people

    LynxSight mobile is an android application that serves as OCR assistant. Application scans pictures taken by camera for text and reads it to user. LynxSight mobile is designed for use by visually impaired people. It contains voice assistant, voice commands and simple UI to make using easier.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Anteater

    Anteater

    Annotation Tool to Extract Endangered Animals from Text Resources

    The goal of this project is the extraction the information listed below from texts downloaded from the Federal Register (https://www.federalregister.gov). The texts are mainly applications for permits, notices about given permits, etc. This software tool is developed by the Max Planck Institute for the History of Science (http://www.mpiwg-berlin.mpg.de) in collaboration with Dirk Wintergrün and Etienne Benson.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    The Information Extraction Plugin allows the use of information extraction techniques within RapidMiner. It can be seen as an interface between natural language and IE- or datamining-methods, by extracting interesting information out of documents.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    G-Asks is a question generation system, developed by LATTE(Learning and Affect Technologies Engineering) research group at The University of Sydney. It uses Natural Language Processing techniques and Machine learning algorithms to generate specific trigger questions. If you use this software in a publication, please cite the paper 2. 1.Ming Liu and Rafael A. Calvo (2012) “Using Information Extraction to Generate Trigger Question for Academic Writing Support”, 11th International Conference...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23

    Large Text File converter

    Java Based Heavy-duty utilitity to process large delimited text files

    TextZilla is a Multithreaded Java utility which can process huge size delimited text files to extract, convert, encode, decode, encrypt/decrypt text data from source and write it in desired output file or files. It provides fully extensible framework based on which Java classes can be created, for example it currently has MD5 conversion capability, based on same design classes for 3DES ,AES or any other Algo can be created.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    SeerSuite
    SeerSuite is an application toolkit for digital libraries and search engines; i.e., CiteSeerX. CiteSeerX has moved to GitHub, please get the latest code from: https://github.com/SeerLabs/CiteSeerX
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25

    AADRTE

    Automatic Arabic Domain-Relevant Term Extraction

    In this research we propose a model for automatic domain-relevant term extraction from Arabic text corpus. The proposed model uses a hybrid approach composed of linguistic and statistical methods to extract terms relevant to specific domains depending on prevalence and tendency term ranking mechanism. This increases precision and recall as a measures of relevancy of extracted terms to a specific domain.
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB