Showing 20 open source projects for "batch text processing"

View related business solutions
  • Go From Idea to Deployed AI App Fast Icon
    Go From Idea to Deployed AI App Fast

    One platform to build, fine-tune, and deploy. No MLOps team required.

    Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
    Try Free
  • Custom VMs From 1 to 96 vCPUs With 99.95% Uptime Icon
    Custom VMs From 1 to 96 vCPUs With 99.95% Uptime

    General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

    Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.
    Try Free
  • 1
    RefDB is a reference database and bibliography tool for SGML, XML, and LaTeX documents, sort of a Reference Manager or BibTeX for markup languages. It is portable and known to run on Linux, Free/NetBSD, OSX, Solaris, and Windows/Cygwin.
    Downloads: 14 This Week
    Last Update:
    See Project
  • 2

    VecText

    Converting text to a structured representation

    ...This way of non-interactive communication enables incorporating the application into a more complicated data mining process integrating several software packages or performing multiple conversions in a batch.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 3

    Safe Harbor Deidentification

    Safe Harbor Deidentification for medical documents

    Phalanx - Deidentify Safe Harbor Deidentification Mode of Phalanx is an abridged pipeline of NLP annotators culminating in NER annotators which write output of text offsets. It uses the Safe Harbor deidentification method.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4

    SmGen

    Verilog Finite State Machine (FSM) Code Generator

    SmGen is a finite state machine (FSM) generator for Verilog. On the other hand, it is not an FSM entry tool. The input is behavioral Verilog with clock boundaries specifically set by the designer. SmGen unrolls this behavioral code and generates an FSM from it in synthesizable Verilog. Clock boundaries are explicitly provided by the designer so there is good control on the expected timing
    Downloads: 0 This Week
    Last Update:
    See Project
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 5
    Capsim(r) C Text Mode Kernel(TMK),DSP and communication blocks, topologies, libraries and tools for the development of high performance block diagram digital signal processing and communications systems,built in interpreter for scripting.SystemC support.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6

    BioC

    We describe a simple XML format to share text documents and annotation

    A minimalist approach to share text documents and data annotations. Allows a large number of different annotations to be represented. Project files contain: - simple code to hold/read/write data and perform sample processing. - BioC-formatted corpora - BioC tools that work with BioC corpora BioC goals - simplicity - interoperability - broad use - reuse There should be little investment required to learn to use a format or a software module to process that format. ...
    Leader badge
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7

    PLP

    Powerfull pre-processor

    Powerful Verilog Preprocessor. PLP stands for Perl Pre-processor. Perl is used as "control language" that is embedded in the Verilog code (or any other code) to generate code on the fly. It is used commonly as a Verilog pre-processor but can be used with any target/output language (C, C++, Java, VHDL, plain text etc)
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    MARF is a general cross-platform framework with a collection of algorithms for audio (voice, speech, and sound) and natural language text analysis and recognition along with sample applications (identification, NLP, etc.) of its use, implemented in Java.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9

    iMir

    Integrated pipeline for HT miRNA-Seq data analysis

    Processing of smallRNA-Seq data to gather biologically relevant information requires application of multiple statistical and bioinformatics tools from different sources, each focusing on a specific step of the analysis pipeline. The analytical workflow can be challenging for the continuous interventions by the operator, a critical factor when large numbers of datasets need to be analyzed at once.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Push Code. Get a Production URL. Done. Icon
    Push Code. Get a Production URL. Done.

    Cloud Run deploys any language instantly. Scales to zero. Pay only when code runs.

    Skip the Kubernetes configs. Cloud Run handles HTTPS, scaling, and infrastructure automatically. Two million requests free per month.
    Try Cloud Run Free
  • 10
    This project is devoted to the development of natural language processing tools and resources for the Lingala language, which is spoken by tens of millions of people in central Africa.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    latexdiff is a Perl script, which compares two latex files and marks up significant differences between them (i.e. a diff for latex files). Various options are available for visual markup using standard latex packages such as "color.sty".
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    A set of Unix command line tools for quick and convenient batch processing of tabular text files (a.k.a., tab-delimited, csv, or flat file format) with a header line. Provides delimiter and compression detection, column reference by name. * tblmap: per-line ("map") computation: derive columns through an expression, delete, reorder, filter rows. * tblred: compute ("reduce") aggregations (e.g., sum, average) over groups defined by key columns
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    This is a Java-based project for complex event extraction from text and co-reference resolution. Currently the code can read BioNLP shared task format (http://2011.bionlp-st.org/) and i2b2 Natural Language Processing for Clinical Data shared task format (https://www.i2b2.org/NLP/DataSets/Main.php). Event extraction includes finding events and the parameters for an event in a text. The method is based on SVM but other ML algorithms can be adopted.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Why is it quicker to express simple equations in plaintext that it is to write the equivalent LaTeX? Easylatex is a preprocessor to make writing LaTeX much quicker. Project activity mode (http://bayleshanks.com/pamv1 ): sporadic
    Downloads: 1 This Week
    Last Update:
    See Project
  • 15
    MutationFinder is a biomedical natural language processing (NLP) system for extracting mentions of point mutations from free text. MutationFinder achieves high performance (99% precision, 81% recall on blind test data) as an information extraction system
    Downloads: 1 This Week
    Last Update:
    See Project
  • 16
    hypKNOWsys aims at developing a Java-based workbench for knowledge discovery and knowledge management. Currently, hypKNOWsys has released two intermediate tools: DIAsDEM Workbench (text mining for semantic tagging) and WUMprep (Web mining pre-processing)
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Biolet is project aimed to analyze genomic DNA sequences using wavelets. Of additional interest may be a batch processing tool for doing round-robin scheduling of jobs on a beowulf cluster. For that checkout the project robin from the CVS.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    The TeX Math Font Generator is a set of programs that can produce a set of TeX math fonts that resemble a given text font. If you want to typeset mathematics with TeX in a special font that you own, then MathGen is what you want.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    STARS is an alternative interface to staden for studying polymorphisms in short sequences with batch processing, manual editing, trace viewing and data management. STARS was initially designed for Multi Locus Sequence Typing of bacteria.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Line command cleaner/converter files.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB