Showing 46 open source projects for "java ocr extraction text"

View related business solutions
  • Compliant and Reliable File Transfers Backed by Top Security Certifications Icon
    Compliant and Reliable File Transfers Backed by Top Security Certifications

    Cerberus FTP Server delivers SOC 2 Type II certified security and FIPS 140-2 validated encryption.

    Stop relying on non-certified, legacy file transfer tools that creak under the weight of modern security demands. Get full audit trails, advanced access controls and more supported by an award-winning team of experts. Start your free 25-day trial today.
    Start Free Trial
  • Earn up to 16% annual interest with Nexo. Icon
    Earn up to 16% annual interest with Nexo.

    Access competitive interest rates on your digital assets.

    Generate interest, borrow against your crypto, and trade a range of cryptocurrencies — all in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • 1
    Eye is an experimental OCR (image-to-text) application.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    FALCON - Text Search Java Project

    FALCON - Text Search Java Project

    JSON based text search Java Project

    ----------------- - What is it? - ----------------- The "Falcon Search" is a JAVA API and tool to search inside the documents. It was originally started to search the content in pdf files under the project "HAWK Search". Searching with this tool is query-based not word-based as in most of the document search tools OR document readers. It also takes care of jumbling of words within query and spelling mistakes. Commonly used techniques in this project are Natural Language...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3

    TML - Text Mining Library for LSA & CMM

    TML is a Java Library for LSA and extracting Concept Maps from text

    TML has moved to http://www.villalon.cl/tml.html and the code to https://github.com/villalon/tml
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4

    TextProcessor

    A Java package to preprocess text datasets for posterior text analysis

    The TextProcessor Java package is a text processing toolkit, which provides some frequently used text processing functions such as stemming, removing stop-words, generating a term vocabulary, and calculating the term-doc frequency matrix. Basic topic mining models such as LDA and sparse NMF are also supported. The package can also generate feature files from a given text dataset with LDA and LIBSVM format for posterior procedures such as classification or clustering. The toolkit is also...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build, govern, and optimize agents and models with Gemini Enterprise Agent Platform.
    Start Free
  • 5
    The Information Extraction Plugin allows the use of information extraction techniques within RapidMiner. It can be seen as an interface between natural language and IE- or datamining-methods, by extracting interesting information out of documents.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    G-Asks is a question generation system, developed by LATTE(Learning and Affect Technologies Engineering) research group at The University of Sydney. It uses Natural Language Processing techniques and Machine learning algorithms to generate specific trigger questions. If you use this software in a publication, please cite the paper 2. 1.Ming Liu and Rafael A. Calvo (2012) “Using Information Extraction to Generate Trigger Question for Academic Writing Support”, 11th International Conference...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    DBpedia Spotlight
    DBpedia Spotlight is a tool for annotating mentions of DBpedia resources in natural language text. The source code is now hosted on GitHub: https://github.com/dbpedia-spotlight
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    This is a Java-based project for complex event extraction from text and co-reference resolution. Currently the code can read BioNLP shared task format (http://2011.bionlp-st.org/) and i2b2 Natural Language Processing for Clinical Data shared task format (https://www.i2b2.org/NLP/DataSets/Main.php). Event extraction includes finding events and the parameters for an event in a text.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    This project aims to implement in java the following text mining techniques: Text Language Detection, Keywords and keyphrases extraction, Text Classification, Text Clustering, Single or multiple documents Summarization, Plagiarism Detection.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 10
    TextMarker
    TextMarker is now developed and hosted at Apache UIMA (http://uima.apache.org/textmarker.html). TextMarker is a UIMA-based tool for information extraction and more. The full featured editor of the rule language and the build process of UIMA descriptors are complemented with components for visualization, explanation, testing and rule learning.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    SEMANTIXS is a semantic information extraction system that can extract, represent and visualize domain-specific information from free-text in the form of complex (and simple) relationships. Refer - http://www.cs.iastate.edu/~semantix/ for more info.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    An information extraction library implementing modern algorithms for the extraction of named entities from text.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Moara is a biological text mining tool and consists of a Java library and some auxiliary MySQL databases for gene/protein training and extraction of mentions and its further normalization and disambiguation.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    TCR Neuroph -Text Character Recognition
    TCR Neuroph - Text Character Recognition is java tool developed to recognize scanned text , using Java Neural Network Framework - Neuroph
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    OpenDMAP (Open Source Direct Memory Access Parser) is a natural language processing (text mining) application: a semantic parser for information extraction.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    JOcrad is a graphical frontend for GNU/Ocrad written in Java. GNU Ocrad is an OCR (Optical Character Recognition) program based on a feature extraction method.JOcrad supports italian and english languages, JPG,PNG and GIF images.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    T-Rex (Trainable Relation Extraction) is a highly configurable machine learning-based Information Extraction from Text framework, which includes tools for document classification, entity extraction and relation extraction.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    MutationFinder is a biomedical natural language processing (NLP) system for extracting mentions of point mutations from free text. MutationFinder achieves high performance (99% precision, 81% recall on blind test data) as an information extraction system
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    This project aims to distribute a facial animation system with speech, developed to brazilian portuguese case. This system is composed by many modules: movement extraction, facial animation and speech, through a text-to-speech system.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    translategemma-4b-it

    translategemma-4b-it

    Lightweight multimodal translation model for 55 languages

    translategemma-4b-it is a lightweight, state-of-the-art open translation model from Google, built on the Gemma 3 family and optimized for high-quality multilingual translation across 55 languages. It supports both text-to-text translation and image-to-text extraction with translation, enabling workflows such as OCR-style translation of signs, documents, and screenshots. With a compact ~5B parameter footprint and BF16 support, the model is designed to run efficiently on laptops, desktops, and private cloud infrastructure, making advanced translation accessible without heavy hardware requirements. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    GraphSpider is a pattern matcher which searches parsed text in phrase-structure tree or dependency graph format for syntactic structures matching a set of patterns in MPL, a regexp-like pattern language. Applications: information extraction, text mining.
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB