Showing 133 open source projects for "text processing"

View related business solutions
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build, govern, and optimize agents and models with Gemini Enterprise Agent Platform.
    Start Free
  • Full-stack observability with actually useful AI | Grafana Cloud Icon
    Full-stack observability with actually useful AI | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 1
    Super-PDF-Editor

    Super-PDF-Editor

    World's most comprehensive, powerful, process-based PDF editor

    World's most comprehensive, powerful, process-based and lighting fast PDF reader, editor and batch processor. PDF editing with 60+ features rich tools and function like OCR pdf and images and produce output like searchable PDF, Text, Hocr, Box, Unlv. Also, improve image enhancement before OCR operation for better OCR performance. pdf Imposition, etc. Super PDF Editor is best for bulk pdf processing, especially for the printing industry. Easy pdf imposition, booklet, n ups pages, and more. OCR performs in pdf files, scanned pdf files and any pdf files. OCR performs in image files, and supports multiple image formats. ...
    Downloads: 12 This Week
    Last Update:
    See Project
  • 2
    Super-PDF-Editor-Lite

    Super-PDF-Editor-Lite

    World's most comprehensive, powerful, process-based PDF editor

    World's most comprehensive, powerful, process-based and lighting fast PDF reader, editor and batch processor. Includes features like Create PDF from Images, HTML, Text files. Create a processing log file. Extract Page, Split Page, Rotate Page, Merge Page, Duplicate page, Move Page, Printing, and Compress Page. Improve image enhancement before OCR operation for better OCR performance. pdf Imposition, etc. Super PDF Editor is best for bulk pdf processing, especially for the printing industry. Easy pdf imposition, booklet, n ups pages, and more. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 3
    Downloads: 3 This Week
    Last Update:
    See Project
  • 4
    BRIC

    BRIC

    BRIC is a powerful tool for batch image processing.

    Bric is a cross-platform batch image processor. You can convert, resize, rotate and add watermark to your images. Multiple file types are supported for input and output. The project started back in 2011 and was maintained for a couple of years. In 2020 BRIC is again in active development, so some of the features written below might be outdated. Please be patient, until everything is reviewed and rewritten.
    Downloads: 0 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 5
    Data Science at the Command Line

    Data Science at the Command Line

    Data science at the command line

    ...To get you started, author Jeroen Janssens provides a Docker image packed with over 100 Unix power tools, useful whether you work with Windows, macOS, or Linux. You’ll quickly discover why the command line is an agile, scalable, and extensible technology. Even if you’re comfortable processing data with Python or R, you’ll learn how to greatly improve your data science workflow by leveraging the command line’s power.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 6
    LaTeX Reference Card Creator

    LaTeX Reference Card Creator

    A Makefile based build system for creating LaTeX reference cards

    LaTeX Reference Card Creator is a Makefile based build system for creating reference cards. LaTeX Reference Card Creator compiles content into PDF, DjVu, TEX DVI, HTML and PostScript output formats. A three column reference card will be created. Features include batch image format conversions, spell checking, broken link checking, automatic backups and .zip and .tar.gz distribution building. LaTeX Reference Card Creator provides many LaTeX examples which can be used to make a reference card.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    nonechucks

    nonechucks

    Deal with bad samples in your dataset dynamically

    ...What if you have a dataset of 1000s of images, out of which a few dozen images are unreadable because the image files are corrupted? Or what if your dataset is a folder full of scanned PDFs that you have to OCRize, and then run a language detector on the resulting text, because you want only the ones that are in English? Or maybe you have an AlternateIndexSampler, and you want to be able to move to dataset[6] after dataset[4] fails while attempting to load! PyTorch's data processing module expects you to rid your dataset of any unwanted or invalid samples before you feed them into its pipeline, and provides no easy way to define a "fallback policy" in case such samples are encountered during dataset iteration.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    TEXT2DATA

    TEXT2DATA

    Text Analytics Platform

    Bring Text Analytics Platform that uses NLP (Natural Language Processing) and Machine Learning to your work environment. Extract essential information from your text documents and let Artificial Intelligence save your time. Get detailed and agile reports on your unstructured data.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    mbFXWords

    mbFXWords

    Analyze text. Diagonal read subject, predicate, obj. Search other pdf.

    Version 1.04. Applies and builds upon Apache OpenNLP. For English, French and German files. JavaFX Application, runs with Oracle Java Runtime Environment version 8 that is including JavaFX. NLP extensions: - Divide sentences in subclauses: segmentation. - Divide plain text: subject, predicate, object. - Count words: stemming. - Search for similar content: pdf's. Gives out subject, predicate and object of sentences of pdf and plain text files. Provides comfortable GUI. Automatic...
    Downloads: 0 This Week
    Last Update:
    See Project
  • AI-powered service management for IT and enterprise teams Icon
    AI-powered service management for IT and enterprise teams

    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.
    Try it Free
  • 10
    Mavscript

    Mavscript

    Calculations in a text document

    Mavscript allows the user to do calculations in a text document. Plain text, LaTeX and OpenOffice Writer files (.odt) are supported. The calculation is done by the algebra system Yacas (default), Jasymca or by the Java interpreter BeanShell.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    PDF-Shuffler
    PDF-Shuffler is a small python-gtk application, which helps the user to merge or split pdf documents and rotate, crop and rearrange their pages using an interactive and intuitive graphical interface. It is a frontend for python-pyPdf.
    Downloads: 37 This Week
    Last Update:
    See Project
  • 12
    pdfsandwich generates "sandwich" OCR pdf files, i.e. pdf files which contain only images (but no editable text) will be processed by optical character recognition (OCR) and the text will be added to each page invisibly "behind" the images. pdfsandwich is a command line tool which is supposed to be useful to OCR scanned books or journals. It is able to recognize the page layout even for multicolumn text. Essentially, pdfsandwich is a wrapper script which calls the following binaries: convert, unpaper, tesseract, gs, and hocr2pdf (if tesseract < 3.03). ...
    Leader badge
    Downloads: 319 This Week
    Last Update:
    See Project
  • 13

    Indexmeister

    automatic indexing for large LaTex documents

    Indexmeister reads a variety of formats (.tex, .docx, .epub, and others) and suggests keywords for indexing. The included program Imbrowse provides a semi-automatic interface to rapidly add index tags to multi-file latex documents.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    CuteReport

    CuteReport

    Qt based report solution

    CuteReport is a report solution like Jasper Report, Crystal Reports or FastReport, but based on Qt framework. It can be easily used with any Qt application. In general, CuteReport consists of two parts: core library and template designer. Both are totally modular and theirs functionality can be easily extended by writing additional modules. It's totally abstract of used data and can use as storage: file system, database, version control systems, etc. The project's goal is to provide...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 15

    PanDocElectron

    Graphical User Interface for PanDoc for Linux, Mac & Windows

    PanDoc Graphical User Interface implemented with Electron for Linux, Mac and Windows. It support users in converting source documents into various other formats like docx, odt, html and reveal documentation. The zip files contain the full source code because PanDocElectron is written in HTML/Javascript. Electron is used more or less as browser that runs the HTML/Javascript application. [Download PanDocElectron](https://sourceforge.net/p/pandocelectron/wiki/Home/) Extract the zip-file...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 16
    Texinfo Web Publisher

    Texinfo Web Publisher

    Multi-format web publishing system based on Texinfo

    Texinfo Web Publisher is a Makefile based publishing system featuring simultaneous con- tent creation into HTML, non-split HTML, Framed HTML, HTML Zip, XML, DocBook, PDF, DjVu, PostScript, DVI, Plain text, Info and EPUB book formats. All Texinfo Web Publisher output formats are from a single source. Texinfo Web Publisher can be used for website creation has FTP deployment capabilities and supports Cascading Style Sheets (CSS). Texinfo Web Publisher is a low maintenance solution for...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Visualization of Protein-Ligand Graphs

    Visualization of Protein-Ligand Graphs

    Compute protein graphs. Moved to https://github.com/MolBIFFM/PTGLtools

    NOTE: Project moved to https://github.com/MolBIFFM/PTGLtools. The Visualization of Protein-Ligand Graphs (VPLG) software package computes and visualizes protein graphs. It works on the super-secondary structure level and uses the atom coordinates from PDB files and the SSE assignments of the DSSP algorithm. VPLG is command line software. If you do not like typing commands, try our PTGL web server: http://ptgl.uni-frankfurt.de/
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    LaTeX Web Publisher

    LaTeX Web Publisher

    LaTeX Web Publisher is a Makefile based Web publishing system

    LaTeX Web Publisher is a Makefile based Web publishing system featuring content creation into HTML, non-split HTML, HTML Zip, PDF, DjVu, PostScript, DVI and Plain text formats. All LaTeX Web Publisher output formats are from a single LaTeX source and have indices. LaTeX Web Publisher can be used for website creation and has FTP deployment capabilities. A website created with LaTeX Web Publisher will have HTML, non-split HTML and PDF content formats. The website will have complete HTML...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    LaTeX Track Changes

    LaTeX Track Changes

    Collaborators on a version-controlled .tex file can track changes.

    LaTeX Track Changes shows changes over time for a .tex file that has its history stored in a git or svn repository. The user can customize how to view the changes: limited to certain authors or by revision or date among other filters. An Emacs mode provides the user interface. Plug-ins for other editors (such as TeXShop) are planned.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20

    MindRaider

    MindRaider is a personal notebook and outliner.

    MindRaider is a personal notebook and outliner. Where do you keep private remarks like ideas, plans, gift tips and howtos? Loads of documents and remarks spread around the file system? Can you find a remark when you need it? No? Try MindRaider!
    Downloads: 1 This Week
    Last Update:
    See Project
  • 21

    iMir

    Integrated pipeline for HT miRNA-Seq data analysis

    Processing of smallRNA-Seq data to gather biologically relevant information requires application of multiple statistical and bioinformatics tools from different sources, each focusing on a specific step of the analysis pipeline. The analytical workflow can be challenging for the continuous interventions by the operator, a critical factor when large numbers of datasets need to be analyzed at once.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    TeleScope

    TeleScope

    XML Data Stream Broker/Replicator

    TeleScope is the efficient intensive-load XML data stream broker, replicator and simple event processing platform (SEP) written in C for the Fedora 17-18, Slackware 13-14, Red Hat Enterprise Linux 6 (RHEL-6) Linux distributions. The platform is intended to be operated upon the single number/word values and is not meant to be deployed for full-text XML stream analysis. TeleScope has internal query language with a set of standard logical operators that allows to construct relatively complex query expressions. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 23
    seppdflatex

    seppdflatex

    Build a large LaTeX book with separate linked chapters

    seppdflatex is a Perl script which automates a lot of the tasks needed to compile PDF documents from LaTeX source for a multi-volume book, or a book with many huge chapters which you may not want as a single document, but which should all be unified by cross-references and external hyperlinks, so a PDF reader will open a link to an external chapter PDF file. The Table of Contents and LOF and LOT are made for all chapters, and the TOC, LOF, LOT are all hyperlinked to the correct external PDF...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Inhaler

    Inhaler

    speed reading tool

    Inhaler is a speed reading tool programmed in scala using swing. It features variable reading speed and font size. It is licensed under GPL.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25

    vCard Manager

    Manage your vCards on your computer

    With this handy tool you can manage your vCards, join/split them to files, store contacts to your phone through QR codes, and edit almost all aspects of the vCard. Supports loading and saving 2.1 / 3.0 / 4.0 vCards.
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB