Showing 98 open source projects for "text processing"

View related business solutions
  • Earn up to 16% annual interest with Nexo. Icon
    Earn up to 16% annual interest with Nexo.

    Access competitive interest rates on your digital assets.

    Generate interest, borrow against your crypto, and trade a range of cryptocurrencies — all in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build, govern, and optimize agents and models with Gemini Enterprise Agent Platform.
    Start Free
  • 1
    TeXworks

    TeXworks

    A simple interface for working with TeX documents

    TeXworks is a free and simple working environment for authoring TeX (LaTeX, ConTeXt and XeTeX) documents. Inspired by Dick Koch's award-winning TeXShop program for Mac OS X, it makes entry into the TeX world easier for those using desktop operating systems other than OS X. It provides an integrated, easy-to-use environment for users on other platforms particularly GNU/Linux and Windows and features a clean, simple interface accessible to casual and non-technical users.
    Downloads: 59 This Week
    Last Update:
    See Project
  • 2
    OpenMed

    OpenMed

    Open source healthcare AI

    OpenMed is an open-source healthcare AI and medical NLP toolkit designed to turn clinical text into structured insights using transformer-based models and production-oriented interfaces. Its core purpose is to provide specialized medical entity extraction, PII detection and de-identification, assertion-aware analysis, and related healthcare text processing capabilities without locking users into a proprietary platform. The project includes a curated registry of more than a dozen medical NER models focused on areas such as diseases, drugs, anatomy, genes, and protected health information, and it is built to support both research and deployment scenarios. ...
    Downloads: 10 This Week
    Last Update:
    See Project
  • 3
    tika-python

    tika-python

    Python binding to the Apache Tika™ REST services

    A Python port of the Apache Tika library that makes Tika available using the Tika REST Server. This makes Apache Tika available as a Python library, installable via Setuptools, Pip and easy to install. To use this library, you need to have Java 7+ installed on your system as tika-python starts up the Tika REST server in the background. To get this working in a disconnected environment, download a tika server file (both tika-server.jar and tika-server.jar.md5, which can be found here) and set...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Scanopy

    Scanopy

    Clean network diagrams, One-time setup, zero upkeep

    Scanopy is a powerful multi-modal data capture and analysis toolkit that enables users to collect, process, and visualize structured and unstructured information from a variety of sources in a flexible pipeline. It is built to handle complex scanning tasks — such as OCR, document analysis, audio transcription, network data capture, and image extraction — while providing unified APIs and workflows that make managing heterogeneous data sources seamless. Developers can compose custom pipelines...
    Downloads: 10 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 5
    AutoGluon

    AutoGluon

    AutoGluon: AutoML for Image, Text, and Tabular Data

    AutoGluon enables easy-to-use and easy-to-extend AutoML with a focus on automated stack ensembling, deep learning, and real-world applications spanning image, text, and tabular data. Intended for both ML beginners and experts, AutoGluon enables you to quickly prototype deep learning and classical ML solutions for your raw data with a few lines of code. Automatically utilize state-of-the-art techniques (where appropriate) without expert knowledge. Leverage automatic hyperparameter tuning, model selection/ensembling, architecture search, and data processing. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Data Formulator

    Data Formulator

    Create rich visualizations with AI

    ...However, these systems do not work well for iterative visualization authoring, because they often require analysts to provide, in a single turn, a text-only prompt that fully describes the complex visualization task to be performed, which is unrealistic to both users and models in many cases. In this paper, we present Data Formulator 2, an LLM-powered visualization system to address these challenges.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    Scribus

    Scribus

    Powerful desktop publishing software

    Scribus is an Open Source program that brings professional page layout to Linux, BSD UNIX, Solaris, OpenIndiana, GNU/Hurd, Mac OS X, OS/2 Warp 4, eComStation, and Windows desktops with a combination of press-ready output and new approaches to page design. Underneath a modern and user-friendly interface, Scribus supports professional publishing features, such as color separations, CMYK and spot colors, ICC color management, and versatile PDF creation.
    Leader badge
    Downloads: 20,527 This Week
    Last Update:
    See Project
  • 8
    Conversational Health Agents (CHA)

    Conversational Health Agents (CHA)

    A Personalized LLM-powered Agent Frameworks

    CHA, or Conversational Health Agents, is an open-source framework designed to build intelligent healthcare assistants powered by large language models and external data sources. The system enables developers to create personalized AI agents that can interact with users through natural language while performing multi-step reasoning and task execution. It integrates orchestration capabilities that allow the agent to gather information from APIs, knowledge bases, and external services in order...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9

    Bulgarian language support

    Spell check, grammar check and hyphenation for Bulgarian language

    The goal of this project is to provide spell check, grammar check and hyphenation for Bulgarian language for Open Source products such as OpenOffice.org, LibreOffice, TeX, aspell, ispell, hunspell etc.
    Leader badge
    Downloads: 23 This Week
    Last Update:
    See Project
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 10
    Articlefox is a workflow system that can be used to prepare the articles of a small journal.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    DocWire SDK

    DocWire SDK

    Award-winning modern data processing SDK in C++20

    DocWire SDK, a standout C++20AI driven data processing tool, has received award from SourceForge and strong backing from Microsoft. It handles nearly 100 file types, empowering efficient text extraction, web data extraction, and document analysis. For businesses, the shift to DocWire SDK signifies a leap forward. It promises comprehensive document format support and the ability to extract valuable insights from email boxes, databases, and websites using cutting-edge AI. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    DOCX Document Converter

    DOCX Document Converter

    Convert .docx to .md/.txt and .html. Free, unlimited, fast.

    A simple, free, unlimited, secure web-based tool that converts Microsoft Word documents (.docx) into Markdown (.md/.txt) and HTML files. Perfect for developers, writers, and anyone who needs to transform .docx MS Office Word documents into web-friendly or AI context friendly formats. Unlike those other jerks on the web that charge many dollars per month for this, I made it free, unlimited and open source. This is a better version of 'convert docx to txt' since .md files can be opened...
    Downloads: 14 This Week
    Last Update:
    See Project
  • 13
    AI File Sorter

    AI File Sorter

    Local AI file organization with categorization and rename suggestions

    AI File Sorter is a cross-platform desktop application that uses AI (local LLMs run on your computer) to organize files and suggest meaningful file names based on real content, not just filenames or extensions. The app can analyze images locally and propose descriptive rename suggestions (for example, IMG_2048.jpg → clouds_over_lake.jpg). It can also analyze document text to improve categorization and renaming. Supported formats include PDF, DOCX, XLSX, PPTX, ODT, ODS, ODP, and common...
    Leader badge
    Downloads: 516 This Week
    Last Update:
    See Project
  • 14
    Mindwtr

    Mindwtr

    A complete Getting Things Done (GTD) productivity system for desktop a

    Mindwtr: The Privacy-First GTD System Mindwtr is a Getting Things Done (GTD) productivity tool designed for "Mind Like Water." It runs completely offline—no accounts, no tracking, and no subscriptions. The Core GTD Workflow Capture: Instantly offload thoughts to your Inbox. Clarify: Process tasks rapidly with the built-in "2-Minute Rule" timer. Organize: Sort tasks by Contexts (@work, @home), Areas, and Projects. Reflect: Keep your system trustworthy with a guided Weekly...
    Leader badge
    Downloads: 9 This Week
    Last Update:
    See Project
  • 15
    pdf combiner merger converter splitter

    pdf combiner merger converter splitter

    PDF Combiner is a user-friendly, GUI-based tool built in

    PDF Combiner is a user-friendly open source free to use, GUI-based tool for combining, pdf to excel, pdf to word, image to pdf, zip, unzip annotate and splitting PDF files. It is easy to use, supports multiple file insert and delete and process, and allows you to adjust the order of files before combining.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 16
    text-dedup

    text-dedup

    All-in-one text de-duplication

    text-dedup is a Python library that enables efficient deduplication of large text corpora by using MinHash and other probabilistic techniques to detect near-duplicate content. This is especially useful for NLP tasks where duplicated training data can skew model performance. text-dedup scales to billions of documents and offers tools for chunking, hashing, and comparing text efficiently with low memory usage. It supports Jaccard similarity thresholding, parallel execution, and flexible...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    oStorybook

    oStorybook

    oStorybook original, the writer's tool

    oStorybook un outil pour les écrivains. ATTENTION : voir sur http://ostorybook.eu --en_EN oStorybook a tool for writers. WARNING : see on http://ostorybook.eu.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    SentimentAnalysis-Rick&Morty

    SentimentAnalysis-Rick&Morty

    Rick & Morty Sentiment Analysis - End-of-Degree Project - UNIR

    The remarkable progress in the field of Big Data has driven the development of new technologies in natural language processing and data analysis. Text mining is a fascinating application of data analysis that extracts relevant information from related writings in different linguistic contexts. And therefore, in natural language processing, sentiment analysis and classification stands out as a key application supported by text mining. Through the extraction of information from textual data, it becomes possible to identify and comprehend the sentiments and emotions conveyed. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19

    modnlp-plugins

    External plugins for modnlp/teccli

    This is a general project for modnlp/teccli plugins, with focus on text visualizaton.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20

    SimpleXlsxWriter

    C++ library for creating XLSX files for MS Excel 2007 and above.

    This library represents XLSX files writer for Microsoft Excel 2007 and above. The main feature of this library is that it uses C++ standard file streams. On the one hand it results in almost unnoticeable memory and CPU resources consumption while processing (that may be very useful at saving a large data arrays), but on the other hand it makes unfeasible to edit data that were written. Hence, if using this library the structure of the future report should be known enough. The library...
    Leader badge
    Downloads: 8 This Week
    Last Update:
    See Project
  • 21
    DataMelt

    DataMelt

    Computation and Visualization environment

    DataMelt (or "DMelt") is an environment for numeric computation, data analysis, computational statistics, and data visualization. This Java multiplatform program is integrated with several scripting languages such as Jython (Python), Groovy, JRuby, BeanShell. DMelt can be used to plot functions and data in 2D and 3D, perform statistical tests, data mining, numeric computations, function minimization, linear algebra, solving systems of linear and differential equations. Linear, non-linear...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 22
    Super-PDF-Editor-Lite

    Super-PDF-Editor-Lite

    World's most comprehensive, powerful, process-based PDF editor

    World's most comprehensive, powerful, process-based and lighting fast PDF reader, editor and batch processor. Includes features like Create PDF from Images, HTML, Text files. Create a processing log file. Extract Page, Split Page, Rotate Page, Merge Page, Duplicate page, Move Page, Printing, and Compress Page. Improve image enhancement before OCR operation for better OCR performance. pdf Imposition, etc. Super PDF Editor is best for bulk pdf processing, especially for the printing industry. Easy pdf imposition, booklet, n ups pages, and more. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 23
    Super-PDF-Editor

    Super-PDF-Editor

    World's most comprehensive, powerful, process-based PDF editor

    World's most comprehensive, powerful, process-based and lighting fast PDF reader, editor and batch processor. PDF editing with 60+ features rich tools and function like OCR pdf and images and produce output like searchable PDF, Text, Hocr, Box, Unlv. Also, improve image enhancement before OCR operation for better OCR performance. pdf Imposition, etc. Super PDF Editor is best for bulk pdf processing, especially for the printing industry. Easy pdf imposition, booklet, n ups pages, and more. OCR performs in pdf files, scanned pdf files and any pdf files. OCR performs in image files, and supports multiple image formats. ...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 24
    Downloads: 3 This Week
    Last Update:
    See Project
  • 25
    BRIC

    BRIC

    BRIC is a powerful tool for batch image processing.

    Bric is a cross-platform batch image processor. You can convert, resize, rotate and add watermark to your images. Multiple file types are supported for input and output. The project started back in 2011 and was maintained for a couple of years. In 2020 BRIC is again in active development, so some of the features written below might be outdated. Please be patient, until everything is reviewed and rewritten.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • Next
MongoDB Logo MongoDB