Join/Login
Business Software
Open Source Software
For Vendors
Blog
About
More

For Vendors Help Create Join Login

Business Software

Open Source Software

SourceForge Podcast

Resources

Articles
Case Studies
Blog

Menu

Help
Create
Join
Login

Home
Open Source Software
Artificial Intelligence
Natural Language Processing (NLP) Tools
Search Results

Search Results for "open document"

x

Sort By:

Relevance

Clear All Filters

OS

Windows 15
Linux 14
Mac 14
More...
BSD 1
Desktop Operating Systems 1

Category

Artificial Intelligence 17
Scientific/Engineering 4
Business 1
Database 1
Software Development 1
System 1

License

OSI-Approved Open Source 16
Creative Commons Attribution License 1

Programming Language

Python 10
Java 3
JavaScript 2
PHP 1
More...
Scala 1

Status

Pre-Alpha 1
Alpha 1
Beta 1
Production/Stable 1

Showing 17 open source projects for "open document"

View related business solutions

Natural Language Processing (NLP) Clear Filters & Widen Search

Try Google Cloud Risk-Free With $300 in Credit
No hidden charges. No surprise bills. Cancel anytime.

Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.

Start Free
Go from Code to Production URL in Seconds
Cloud Run deploys apps in any language instantly. Scales to zero. Pay only when code runs.

Skip the Kubernetes configs. Cloud Run handles HTTPS, scaling, and infrastructure automatically. Two million requests free per month.

Try it free
1

deepdoctection

A Repo For Document AI

DeepDoctection is a document AI framework that applies deep learning techniques to analyze and extract structured data from scanned documents, PDFs, and images. deepdoctection is a Python library that orchestrates document extraction and document layout analysis tasks using deep learning models. It does not implement models but enables you to build pipelines using highly acknowledged libraries for object detection, OCR and selected NLP tasks and provides an integrated frameworks for...

Downloads: 5 This Week

Last Update: 2026-05-02
See Project
2

Docspell

Assist in organizing your piles of documents

Docspell is a personal document organizer. Or sometimes called a "Document Management System" (DMS). You'll need a scanner to convert your papers into files. Docspell can then assist in organizing the resulting mess. It can unify your files from scanners, emails, and other sources. It is targeted for home use, i.e. families, households, and also for smaller groups/companies. You can associate tags, set correspondent,s and lots of other predefined and custom metadata. If your documents are...

Downloads: 2 This Week

Last Update: 2025-03-15
See Project
3

ExtractThinker

ExtractThinker is a Document Intelligence library for LLMs

ExtractThinker is a tool designed to facilitate the extraction and analysis of information from various data sources, aiding in data processing and knowledge discovery.

Downloads: 4 This Week

Last Update: 2025-06-09
See Project
4

Search-Index

A persistent, network resilient, full text search library

Search-Index is a lightweight and fast JavaScript-based search engine that enables full-text search indexing and retrieval for web applications.

Downloads: 4 This Week

Last Update: 2025-03-12
See Project
Earn up to 16% annual interest with Nexo.
Access competitive interest rates on your digital assets.

Generate interest, borrow against your crypto, and trade a range of cryptocurrencies — all in one platform. Geographic restrictions, eligibility, and terms apply.

Get started with Nexo.
5

PaperAI

Semantic search and workflows for medical/scientific papers

PaperAI is an open-source framework for searching and analyzing scientific papers, particularly useful for researchers looking to extract insights from large-scale document collections.

Downloads: 4 This Week

Last Update: 2025-07-01
See Project
6

BEIR

A Heterogeneous Benchmark for Information Retrieval

BEIR is a benchmark framework for evaluating information retrieval models across various datasets and tasks, including document ranking and question answering.

Downloads: 3 This Week

Last Update: 2025-06-04
See Project
7

Apache OpenNLP

Apache OpenNLP

Apache OpenNLP is a machine learning-based NLP library that provides tools for text-processing tasks such as tokenization, sentence segmentation, and named entity recognition.

Downloads: 0 This Week

Last Update: 2026-04-28
See Project
8

Haystack

Haystack is an open source NLP framework to interact with your data

Apply the latest NLP technology to your own data with the use of Haystack's pipeline architecture. Implement production-ready semantic search, question answering, summarization and document ranking for a wide range of NLP applications. Evaluate components and fine-tune models. Ask questions in natural language and find granular answers in your documents using the latest QA models with the help of Haystack pipelines. Perform semantic search and retrieve ranked documents according to meaning,...

Downloads: 6 This Week

Last Update: 2026-04-20
See Project
9

Prime QA

State-of-the-art Multilingual Question Answering research

PrimeQA is a public open source repository that enables researchers and developers to train state-of-the-art models for question answering (QA). By using PrimeQA, a researcher can replicate the experiments outlined in a paper published in the latest NLP conference while also enjoying the capability to download pre-trained models (from an online repository) and run them on their own custom data. PrimeQA is built on top of the Transformers toolkit and uses datasets and models that are directly...

Downloads: 0 This Week

Last Update: 2023-08-21
See Project
Our Free Plans just got better! | Auth0
With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now
10

Common Resource Grep - crgrep

Common Resource Grep

CRGREP searches for matching text in databases, various document formats, archives and other difficult to access resources. A command line tool for name and content text matching in database tables, plain files, MS Office documents, PDF, archives, MP3 audio, image meta-data, scanned documents, maven dependencies and web resources. CRGREP will search resources within resources of any arbitrary combination or depth, so text within a document within a zip archive, and so on. Here you...

3 Reviews

Downloads: 0 This Week

Last Update: 2023-04-23
See Project
11

Synonyms

Chinese synonyms, chat robot, intelligent question and answer toolkit

Chinese Synonyms for natural language processing and understanding. Better Chinese synonyms, chatbot, intelligent question and answer toolkit. synonymsCan be used for many tasks in natural language understanding, text alignment, recommendation algorithms, similarity calculation, semantic shifting, keyword extraction, concept extraction, automatic summarization, search engines, etc. Print synonyms in a friendly way for easy debugging. "Synonyms Cilin" was compiled by Mei Jiaju and others in...

Downloads: 0 This Week

Last Update: 2022-01-14
See Project
12

Parsr

Transforms PDF, Documents and Images into Enriched Structured Data

Parsr is an open-source document parsing tool that converts PDFs, scanned images, and other structured documents into structured, machine-readable data formats.

Downloads: 4 This Week

Last Update: 2025-01-21
See Project
13

TextRank

TextRank implementation for Python 3

TextRank is an implementation of the TextRank algorithm for extractive text summarization and keyword extraction, inspired by Google’s PageRank.

Downloads: 0 This Week

Last Update: 2025-01-24
See Project
14

AI learning

AiLearning, data analysis plus machine learning practice

We actively respond to the Research Open Source Initiative (DOCX) . Open source today is not just open source, but datasets, models, tutorials, and experimental records. We are also exploring other categories of open source solutions and protocols. I hope you will understand this initiative, combine this initiative with your own interests, and do what you can. Everyone's tiny contributions, together, are the entire open source ecosystem.

Downloads: 0 This Week

Last Update: 2022-02-18
See Project
15

Corpus redundancy manager

Redundancy due to cut-paste operations in text creates bias in machine learning for NLP. This module takes a directory and produces a subset of the files in that directory (in a list) with an upper bound on similarity between two files.

Downloads: 0 This Week

Last Update: 2014-06-30
See Project
16

suffix arrays for phrase extraction

Java Suffix array library for phrase discovery. Inspired initially by the classic paper of Yamamoto & Church, with newer ideas from Abouelhoda et al and Kim et al. Adapted for large alphabet so that words can be tokenized as alphabet characters.

Downloads: 0 This Week

Last Update: 2013-05-02
See Project
17

PHP classes for NLP

A set of classes for Natural Language Processing in PHP for: 1. Part of speech Tagging - Brill, n-gram, HMM 2. Princeton Wordnet querying and access 3. Document summarization 4. Document classification - EM, Bayes 5. Stemming - Porter, Lancaster

Downloads: 0 This Week

Last Update: 2012-10-15
See Project

Previous
You're on page 1
Next

Related Searches

pdf

dms

data cleaning

search engine offline html

medical

opennlp

grep

binary robot trading

ai

self-learning ai

Related Categories

Artificial Intelligence

Scientific/Engineering

Business

Database

Software Development

SourceForge

Create a Project
Open Source Software
Business Software
Top Downloaded Projects

Company

About
Team
SourceForge Headquarters
1320 Columbia Street Suite 310
San Diego, CA 92101
+1 (858) 422-6466

Resources

Support
Site Documentation
Site Status
SourceForge Reviews

© 2026 Slashdot Media. All Rights Reserved.

Terms Privacy Opt Out Advertise