Showing 31 open source projects for "text processing"

View related business solutions
  • Fully Managed MySQL, PostgreSQL, and SQL Server Icon
    Fully Managed MySQL, PostgreSQL, and SQL Server

    Automatic backups, patching, replication, and failover. Focus on your app, not your database.

    Cloud SQL handles your database ops end to end, so you can focus on your app.
    Try Free
  • Earn up to 16% annual interest with Nexo. Icon
    Earn up to 16% annual interest with Nexo.

    Access competitive interest rates on your digital assets.

    Generate interest, borrow against your crypto, and trade a range of cryptocurrencies — all in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • 1
    Bowtie, an ultrafast, memory-efficient short read aligner for short DNA sequences (reads) from next-gen sequencers. Please cite: Langmead B, et al. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10:R25.
    Leader badge
    Downloads: 382 This Week
    Last Update:
    See Project
  • 2
    GATE
    NOTE THAT THE SOURCE CODE AND ISSUE TRACKER HAVE NOW MOVED TO GITHUB. FIND US AT https://github.com/GateNLP/ GATE (General Architecture for Text Engineering) is an architecture, framework and development environment for developing, evaluating and embedding Human Language Technology. See http://gate.ac.uk for full details.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 3
    TXM

    TXM

    Unicode XML TEI text analysis platform

    TXM is a free and open-source cross-platform Unicode & XML based text analysis environment and graphical client, supporting Windows, Linux and Mac OS X. It can also be used online as a J2EE standard compliant web portal (GWT based) with access control built in. DOWNLOAD LATEST VERSION OF TXM : http://textometrie.ens-lyon.fr/spip.php?rubrique61&lang=en TXM offers a comprehensive range of analysis tools (concordances, collocate search, frequency lists, etc.) based on the powerfull CQP...
    Downloads: 16 This Week
    Last Update:
    See Project
  • 4

    Ghawwas_V4

    An open source system for Arabic corpora processing

    Ghawwas (previously known as Khawas) is an open source system for Arabic corpora processing. Ghawwas V4.0 provides the following main functions: a. Frequency list for single word and N-Grams b. Concordance c. Collocation (MI, CHI Squared, LL, T-Score, Z Score, Dice, Log Dice, Weirdness Coefficient) d. Lexical patterns search e. Two corpora frequency profile comparison based on MI, CHI, LL, T-Score, Z Score, Dice, Log Dice, Weirdness Coefficient f. Accept Windows and UTF-8 character...
    Downloads: 4 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 5

    BioC

    We describe a simple XML format to share text documents and annotation

    A minimalist approach to share text documents and data annotations. Allows a large number of different annotations to be represented. Project files contain: - simple code to hold/read/write data and perform sample processing. - BioC-formatted corpora - BioC tools that work with BioC corpora BioC goals - simplicity - interoperability - broad use - reuse There should be little investment required to learn to use a format or a software module to process that format. ...
    Leader badge
    Downloads: 16 This Week
    Last Update:
    See Project
  • 6
    Welsh Natural Language Toolkit

    Welsh Natural Language Toolkit

    WNLT is a suite of open source natural language modules for the Welsh

    The project supports the Welsh Language Technology domain with a set of NLP tools that drive innovation and advance the development of sophisticated textual analysis solutions. The WNLT project delivers four core NLP modules; a) Word Segmentation for separating text into words b) Sentence Boundary Disambiguation for finding sentence boundaries c) Part of Speech Tagger for determining the part of speech of each word d) Morphological Analyser for identifying the root form (lemma) of words....
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    The Java Data Mining Package (JDMP) is a library that provides methods for analyzing data with the help of machine learning algorithms (e.g. clustering, classification, graphical models, neural networks, Bayesian networks, text processing, optimization).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8

    Khawas

    An Arabic Corpora Processing Tool

    The new version is available at https://sourceforge.net/projects/ghawwasv4/
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9
    The BioNLP UIMA Component Repository provides UIMA wrappers for novel and well-known 3rd-party NLP tools used in biomedical text prosessing, such as tokenizers, parsers, named entity taggers, and tools for evaluation.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 10
    FALCON - Text Search Java Project

    FALCON - Text Search Java Project

    JSON based text search Java Project

    ----------------- - What is it? - ----------------- The "Falcon Search" is a JAVA API and tool to search inside the documents. It was originally started to search the content in pdf files under the project "HAWK Search". Searching with this tool is query-based not word-based as in most of the document search tools OR document readers. It also takes care of jumbling of words within query and spelling mistakes. Commonly used techniques in this project are Natural Language...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    HAWK - PDF Text Search Java Project

    HAWK - PDF Text Search Java Project

    No more support for this project - TAKE A LOOK AT FALCONSEARCH

    No more support for this project - TAKE A LOOK AT FALCONSEARCH "https://sourceforge.net/projects/falcontextsearch/"
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    ASTL Automata Standard Template Library (Vincent Le Maout - Dominique Revuz) is a set of generic and efficient C++ components for automata manipulation.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Apolda is a plugin for the Gate framework (see http://sourceforge.net/projects/gate/) that annotates texts with labels of concepts from an arbitrary OWL-ontology.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    TextMarker
    TextMarker is now developed and hosted at Apache UIMA (http://uima.apache.org/textmarker.html). TextMarker is a UIMA-based tool for information extraction and more. The full featured editor of the rule language and the build process of UIMA descriptors are complemented with components for visualization, explanation, testing and rule learning.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 15
    A lyrical analysis and classification tool focused specifically on rhyming style in rap lyrics. Functions include phonetic transcription, rhyme visualization, and rapper classification.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 16
    This project is a compilation of tools/libraries to help with tasks related to Text Analytics mainly in Java. These tools range from simple wrappers to sophisticated mining tasks that can improve the productivity of researchers and engineers.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    OpenDMAP (Open Source Direct Memory Access Parser) is a natural language processing (text mining) application: a semantic parser for information extraction.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    OCR c++ library. Include: contour recognition; vectorisation; matrix letter feature recognition; auto page segmentation and detect rotation; SS3 ASM core; XML base; web-based GUI; 99,6% printed Unicode text recognition; letter base up to 1200 letters.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    iDocs is a intellectual document work flow with text mining options project.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    The Java Text Categorizing Library (JTCL) is a pure java implementation of libTextCat which in turn is "a library that was primarily developed for language guessing, a task on which it is known to perform with near-perfect accuracy."
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    hypKNOWsys aims at developing a Java-based workbench for knowledge discovery and knowledge management. Currently, hypKNOWsys has released two intermediate tools: DIAsDEM Workbench (text mining for semantic tagging) and WUMprep (Web mining pre-processing)
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Java Expert Rule Based Inference Language. Jerbil is an open source rule processing engine written in Java. Currently Jerbil supports a full set of processing functions with text-based and XML interfaces; a Java interface is planned.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    The Word Vector Tool is a simple but flexible Java library to create word vector representations of text documents. Word vectors can be used for various text processing tasks, as text classification, text clustering or information retrieval.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Flesh is a Java application designed to analyze a document (plain text, rich text, Word documents, and PDFs) and display the difficulty associated with comprehending using the Flesch-Kincaid Grade Level and the Flesch Reading Ease Score.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 25
    JTextPro: A Java-based Text Processing tool that includes sentence boundary detection (using maximum entropy classifier), word tokenization (following Penn conventions), part-of-speech tagging (using CRFTagger), and phrase chunking (using CRFChunker).
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next
MongoDB Logo MongoDB