Join/Login
Business Software
Open Source Software
For Vendors
Blog
About
More

For Vendors Help Create Join Login

Business Software

Open Source Software

SourceForge Podcast

Resources

Articles
Case Studies
Blog

Menu

Help
Create
Join
Login

Home
Open Source Software
Artificial Intelligence
Natural Language Processing (NLP) Tools
Search Results

Search Results for "document"

x

Sort By:

Relevance

Clear All Filters

OS

Linux 19
Windows 19
Mac 18
More...
BSD 2
ChromeOS 1
Desktop Operating Systems 1

Category

Artificial Intelligence 19
Scientific/Engineering 3
Business 1
Database 1
Education 1
Software Development 1
System 1

License

OSI-Approved Open Source 13
Creative Commons Attribution License 2
Other License 1

Translations

English 1

Programming Language

Python 9
Java 4
JavaScript 2
R 1
More...
Scala 1

Status

Beta 2
Alpha 1
Production/Stable 1

Showing 19 open source projects for "document"

View related business solutions

Natural Language Processing (NLP) Linux Clear Filters & Widen Search

MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
Gemini 3 and 200+ AI Models on One Platform
Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

Build generative AI apps with Vertex AI. Switch between models without switching platforms.

Start Free
1

deepdoctection

A Repo For Document AI

DeepDoctection is a document AI framework that applies deep learning techniques to analyze and extract structured data from scanned documents, PDFs, and images. deepdoctection is a Python library that orchestrates document extraction and document layout analysis tasks using deep learning models. It does not implement models but enables you to build pipelines using highly acknowledged libraries for object detection, OCR and selected NLP tasks and provides an integrated frameworks for fine-tuning, evaluating and running models. ...

Downloads: 3 This Week

Last Update: 2026-04-09
See Project
2

Docspell

Assist in organizing your piles of documents

Docspell is a personal document organizer. Or sometimes called a "Document Management System" (DMS). You'll need a scanner to convert your papers into files. Docspell can then assist in organizing the resulting mess. It can unify your files from scanners, emails, and other sources. It is targeted for home use, i.e. families, households, and also for smaller groups/companies.

Downloads: 5 This Week

Last Update: 2025-03-15
See Project
3

tidytext

Text mining using tidy tools

tidytext brings tidy data principles to text mining by converting text into a tidy data frame format. It provides tools for tokenization, sentiment analysis, n‑gram creation, and term‑document matrices, enabling interoperability with dplyr, ggplot2, and other tidyverse workflows.

Downloads: 0 This Week

Last Update: 2025-07-30
See Project
4

Search-Index

A persistent, network resilient, full text search library

Search-Index is a lightweight and fast JavaScript-based search engine that enables full-text search indexing and retrieval for web applications.

Downloads: 6 This Week

Last Update: 2025-03-12
See Project
$300 in Free Credit Towards Top Cloud Services
Build VMs, containers, AI, databases, storage—all in one place.

Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.

Get Started
5

PaperAI

Semantic search and workflows for medical/scientific papers

PaperAI is an open-source framework for searching and analyzing scientific papers, particularly useful for researchers looking to extract insights from large-scale document collections.

Downloads: 7 This Week

Last Update: 2025-07-01
See Project
6

ExtractThinker

ExtractThinker is a Document Intelligence library for LLMs

ExtractThinker is a tool designed to facilitate the extraction and analysis of information from various data sources, aiding in data processing and knowledge discovery.

Downloads: 5 This Week

Last Update: 2025-06-09
See Project
7

BEIR

A Heterogeneous Benchmark for Information Retrieval

BEIR is a benchmark framework for evaluating information retrieval models across various datasets and tasks, including document ranking and question answering.

Downloads: 2 This Week

Last Update: 2025-06-04
See Project
8

Apache OpenNLP

Apache OpenNLP

Apache OpenNLP is a machine learning-based NLP library that provides tools for text-processing tasks such as tokenization, sentence segmentation, and named entity recognition.

Downloads: 0 This Week

Last Update: 2026-03-26
See Project
9

Prime QA

State-of-the-art Multilingual Question Answering research

PrimeQA is a public open source repository that enables researchers and developers to train state-of-the-art models for question answering (QA). By using PrimeQA, a researcher can replicate the experiments outlined in a paper published in the latest NLP conference while also enjoying the capability to download pre-trained models (from an online repository) and run them on their own custom data. PrimeQA is built on top of the Transformers toolkit and uses datasets and models that are directly...

Downloads: 0 This Week

Last Update: 2023-08-21
See Project
Custom VMs From 1 to 96 vCPUs With 99.95% Uptime
General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.

Try Free
10

Common Resource Grep - crgrep

Common Resource Grep

CRGREP searches for matching text in databases, various document formats, archives and other difficult to access resources. A command line tool for name and content text matching in database tables, plain files, MS Office documents, PDF, archives, MP3 audio, image meta-data, scanned documents, maven dependencies and web resources. CRGREP will search resources within resources of any arbitrary combination or depth, so text within a document within a zip archive, and so on. ...

3 Reviews

Downloads: 6 This Week

Last Update: 2023-04-23
See Project
11

Synonyms

Chinese synonyms, chat robot, intelligent question and answer toolkit

Chinese Synonyms for natural language processing and understanding. Better Chinese synonyms, chatbot, intelligent question and answer toolkit. synonymsCan be used for many tasks in natural language understanding, text alignment, recommendation algorithms, similarity calculation, semantic shifting, keyword extraction, concept extraction, automatic summarization, search engines, etc. Print synonyms in a friendly way for easy debugging. "Synonyms Cilin" was compiled by Mei Jiaju and others in...

Downloads: 0 This Week

Last Update: 2022-01-14
See Project
12

Parsr

Transforms PDF, Documents and Images into Enriched Structured Data

Parsr is an open-source document parsing tool that converts PDFs, scanned images, and other structured documents into structured, machine-readable data formats.

Downloads: 3 This Week

Last Update: 2025-01-21
See Project
13

TextRank

TextRank implementation for Python 3

TextRank is an implementation of the TextRank algorithm for extractive text summarization and keyword extraction, inspired by Google’s PageRank.

Downloads: 0 This Week

Last Update: 2025-01-24
See Project
14

AI learning

AiLearning, data analysis plus machine learning practice

...The daily up of all its websites exceeds 4k, and the peak of Alexa ranking is 20k. Our core members are certified as CSDN blog experts and short-book programmers as excellent authors. We have established ApacheCN, a non-profit document, and tutorial translation project.

Downloads: 0 This Week

Last Update: 2022-02-18
See Project
15

Natural Language Analysis with Ngrams

NLP tool for statistical analysis of words, sentences, documents

...EOWL list of English words was used to filter-out the words from Ngrams data. For each year, per word, the data was added and calculated to describe the average appearance of a word per document for a given year. Before using this program, you MUST download the corpus.

Downloads: 0 This Week

Last Update: 2015-02-01
See Project
16

FALCON - Text Search Java Project

JSON based text search Java Project

...It was originally started to search the content in pdf files under the project "HAWK Search". Searching with this tool is query-based not word-based as in most of the document search tools OR document readers. It also takes care of jumbling of words within query and spelling mistakes. Commonly used techniques in this project are Natural Language Processing, Information Extraction and Question-Answering Architecture. ---------------------- - Latest Version - ---------------------- Details of latest version can be found on project website - http://geekdadaji.com --------------------------- - CONTACT DETAILS - --------------------------- CREATOR : SWAPNIL A JADHAV (saj1919) EMAIL ID : dadajibudhau@gmail.com WEBSITE : http://geekdadaji.com LICENSE : CC BY-NC 4.0

Downloads: 0 This Week

Last Update: 2014-04-18
See Project
17

KALIMAT Multipurpose Arabic Corpus

A corpus that could be of help for researchers working on Arabic NLP

KALIMAT a Multipurpose Arabic Corpus We are pleased to announce the immediate availability of KALIMAT 1.0, KALIMAT is an Arabic natural language resource that consists of: 1) 20,291 Arabic articles collected from the Omani newspaper Alwatan by (Abbas et al. 2011). 2) 20,291 Extractive Single-document system summaries. 3) 2,057 Extractive Multi-document system summaries. 4) 20,291 Named Entity Recognised articles. 5) 20,291 Part of Speech Tagged articles. 6) 20,291 Morphologically Analyse articles. The data collection articles fall into six categories: culture, economy, local-news, international-news, religion, and sports. ...

Downloads: 0 This Week

Last Update: 2015-04-09
See Project
18

Corpus redundancy manager

Redundancy due to cut-paste operations in text creates bias in machine learning for NLP. This module takes a directory and produces a subset of the files in that directory (in a list) with an upper bound on similarity between two files.

Downloads: 0 This Week

Last Update: 2014-06-30
See Project
19

NLP4J Natural Language Processing 4 Java

NLP4J library is a toolset written in Java for Natural Language Processing. This version is oriented to Document Classification and uses Naive Bayes, TF-IDF, etc. There are also pre-processing tools.

Downloads: 0 This Week

Last Update: 2014-07-01
See Project

Previous
You're on page 1
Next

Related Searches

pdf

dms

search engine offline html

medical

data cleaning

opennlp

grep

binary robot trading

ai

ngram

Related Categories

Artificial Intelligence

Scientific/Engineering

Business

Database

Education

SourceForge

Create a Project
Open Source Software
Business Software
Top Downloaded Projects

Company

About
Team
SourceForge Headquarters
1320 Columbia Street Suite 310
San Diego, CA 92101
+1 (858) 422-6466

Resources

Support
Site Documentation
Site Status
SourceForge Reviews

© 2026 Slashdot Media. All Rights Reserved.

Terms Privacy Opt Out Advertise