Join/Login
Business Software
Open Source Software
For Vendors
Blog
About
More

For Vendors Help Create Join Login

Business Software

Open Source Software

SourceForge Podcast

Resources

Articles
Case Studies
Blog

Menu

Help
Create
Join
Login

Home
Open Source Software
Scientific/Engineering
Information Analysis Software
Search Results

Search Results for "documents"

x

Sort By:

Relevance

Clear All Filters

OS

Linux 20
Windows 19
Mac 17
More...
BSD 12
ChromeOS 11
Desktop Operating Systems 1
Embedded Operating Systems 1

Category

Scientific/Engineering 22
Artificial Intelligence 11
Education 4
Business 2
Communications 2
Software Development 2
Formats and Protocols 1
Multimedia 1
System 1
Text Editors 1

License

OSI-Approved Open Source 19
Creative Commons Attribution License 2
Other License 1
Public Domain 1

Translations

English 8
French 2
Brazilian Portuguese 1
German 1
More...
Russian 1
Slovak 1
Spanish 1

Programming Language

Java 22
C++ 2
Groovy 2
C 1
Perl 1
More...
Prolog 1
Python 1
Ruby 1
S/R 1
Unix Shell 1

Status

Beta 11
Production/Stable 5
Alpha 3
Pre-Alpha 2
More...
Mature 1

Showing 22 open source projects for "documents"

View related business solutions

Information Analysis Java Clear Filters & Widen Search

Our Free Plans just got better! | Auth0
With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now
Context for your AI agents
Crawl websites, sync to vector databases, and power RAG applications. Pre-built integrations for LLM pipelines and AI assistants.

Build data pipelines that feed your AI models and agents without managing infrastructure. Crawl any website, transform content, and push directly to your preferred vector store. Use 10,000+ tools for RAG applications, AI assistants, and real-time knowledge bases. Monitor site changes, trigger workflows on new data, and keep your AIs fed with fresh, structured information. Cloud-native, API-first, and free to start until you need to scale.

Try for free
1

TXM

Unicode XML TEI text analysis platform

TXM is a free and open-source cross-platform Unicode & XML based text analysis environment and graphical client, supporting Windows, Linux and Mac OS X. It can also be used online as a J2EE standard compliant web portal (GWT based) with access control built in. DOWNLOAD LATEST VERSION OF TXM : http://textometrie.ens-lyon.fr/spip.php?rubrique61&lang=en TXM offers a comprehensive range of analysis tools (concordances, collocate search, frequency lists, etc.) based on the powerfull CQP...

Downloads: 6 This Week

Last Update: 2024-12-09
See Project
2

BioC

We describe a simple XML format to share text documents and annotation

A minimalist approach to share text documents and data annotations. Allows a large number of different annotations to be represented. Project files contain: - simple code to hold/read/write data and perform sample processing. - BioC-formatted corpora - BioC tools that work with BioC corpora BioC goals - simplicity - interoperability - broad use - reuse There should be little investment required to learn to use a format or a software module to process that format. ...

Downloads: 4 This Week

Last Update: 2016-08-08
See Project
3

OpenIMAJ

OpenIMAJ: The Open toolkit for Intelligent Multimedia Analysis in Java. OpenIMAJ contains a large collection of pure-Java classes for analysing multimedia documents, from tools for extracting image features, to tools for analysing web pages.

Downloads: 0 This Week

Last Update: 2015-03-18
See Project
4

FALCON - Text Search Java Project

JSON based text search Java Project

----------------- - What is it? - ----------------- The "Falcon Search" is a JAVA API and tool to search inside the documents. It was originally started to search the content in pdf files under the project "HAWK Search". Searching with this tool is query-based not word-based as in most of the document search tools OR document readers. It also takes care of jumbling of words within query and spelling mistakes. Commonly used techniques in this project are Natural Language Processing, Information Extraction and Question-Answering Architecture. ---------------------- - Latest Version - ---------------------- Details of latest version can be found on project website - http://geekdadaji.com --------------------------- - CONTACT DETAILS - --------------------------- CREATOR : SWAPNIL A JADHAV (saj1919) EMAIL ID : dadajibudhau@gmail.com WEBSITE : http://geekdadaji.com LICENSE : CC BY-NC 4.0

Downloads: 0 This Week

Last Update: 2014-04-18
See Project
The Original Buy Center Software.
Never Go To The Auction Again.

VAN sources private-party vehicles from over 20 platforms and provides all necessary tools to communicate with sellers and manage opportunities. Franchise and Independent dealers can boost their buy center strategies with our advanced tools and an experienced Acquisition Coaching™ team dedicated to your success.

Learn More
5

Unsupervised TXT classifier

Classify any two TXT documents, no training required - JAVA

...The summarizer from Classifier4J has been adjusted to accept two inputs (lets call them A and B). Then, the summarizer gets trained with A to summarize a document B, and vice versa. This extracts a relevant structure for both documents (and thus avoids the over-training) which are then compared using the Vector-Space analysis to give a range of belonging of one document to another (and thus avoids the shortage of information). This method can be used to create the user-defined classes by merging texts of certain categories and then to calculate the relevant distances between the documents, but this is not necessary.

Downloads: 0 This Week

Last Update: 2013-12-19
See Project
6

DocCO

Non-disjoint groupping of Documents based on word sequence approach

This is a GUI for learning non disjoint groups of documents based on Weka machine learning framework. It offers the possibility to make non disjoint clustering of documents using both vectorial and sequential representation (word sequence approach based on WSK kernel). All data format supported by WEKA could be used in DocCO. Data could be loaded from files, from databases or from specified URL.

Downloads: 0 This Week

Last Update: 2013-08-17
See Project
7

OpenSHORE

OpenSHORE is an XML based Semantic Document Repository (SDR) with a free definable meta model that builds up a semantic network from sections and relations in documents. The acronym SHORE means Semantic Hypertext Object Repository.

Downloads: 0 This Week

Last Update: 2013-04-15
See Project
8

ontea

Ontea - Pattern based Semantic Annotation Platform. Ontea search or create semantic meta data from text or documents using pattern based approaches. Implementation currently includes regular expressions (regex) patterns

Downloads: 0 This Week

Last Update: 2012-11-27
See Project
9

LookIng4LO

This proyect presents a system, which, from a corpus of documents, extracts information about a theme area, and a pedagogical components collection. This information is packed into fine granularity learning objects (metadata included).

Downloads: 0 This Week

Last Update: 2013-04-08
See Project
Network Management Software and Tools for Businesses and Organizations | Auvik Networks
Mapping, inventory, config backup, and more.

Reduce IT headaches and save time with a proven solution for automated network discovery, documentation, and performance monitoring. Choose Auvik because you'll see value in minutes, and stay with us to improve your IT for years to come.

Learn More
10

iScore

iScore measures the interestingness of news articles in a limited user environment. It is an online learning algorithm that combines a large set of disparate features to classify documents. To download the source code, please use subversion.

Downloads: 0 This Week

Last Update: 2015-05-25
See Project
11

RDF Document Manager

RDF-DocMan is a document manager based on a Sesame (RDF repository) backend. Documents are stored in the filesystem and their metadata in a Sesame repository. It was developed for porQual web content generator (also in sf.net).

Downloads: 0 This Week

Last Update: 2013-04-23
See Project
12

Balder

Baldr, aka Balder, is a plagiarism prevention software. Inspired by late researches in computer sciences, it’s able to compare documents using only properties linked to data compression.

Downloads: 0 This Week

Last Update: 2013-04-12
See Project
13

OntoExtractor

OntoExtractor is a way of building ontologies that proceeds in a bottom-up fashion, defining concepts as clusters of concrete XML objects. From a set of XML documents the application generates a taxonomy. OntoExtractor has been developed so far by the Kn

Downloads: 0 This Week

Last Update: 2016-07-30
See Project
14

MIEX

The aim of MIEX (Metadata and Information Extractor from small XML documents) is to create a wrapper for the Stanford Parser, to extract and store metadata (syntactic structures, relationships among words...) from simple XML documents.

Downloads: 0 This Week

Last Update: 2013-04-08
See Project
15

JWebPro: A Java Web Processing Toolkit

JWebPro: A Java tool that can interact with Google search and then process the returned Web documents in a couple of ways. The outputs can serve as inputs for NLP, IR, infor extraction, Web mining, online social network extraction/analysis applications.

Downloads: 0 This Week

Last Update: 2013-03-13
See Project
16

MultiJADS: Multiagent ADD Shell

MultiJADS is a domain independent multiagent active design documents shell. It uses multiagent technology to support activities in concurrent and distributed design systems and is based on the Active Design Documents (ADD) approach.

Downloads: 0 This Week

Last Update: 2015-08-14
See Project
17

Word Vector Tool

The Word Vector Tool is a simple but flexible Java library to create word vector representations of text documents. Word vectors can be used for various text processing tasks, as text classification, text clustering or information retrieval.

Downloads: 0 This Week

Last Update: 2013-04-08
See Project
18

Flesh

Flesh is a Java application designed to analyze a document (plain text, rich text, Word documents, and PDFs) and display the difficulty associated with comprehending using the Flesch-Kincaid Grade Level and the Flesch Reading Ease Score.

2 Reviews

Downloads: 6 This Week

Last Update: 2013-04-03
See Project
19

EGADSS Decision Support System

Evidence-based Guideline and Decision Support System. Provides patient specific point of care reminders in order to aid physicians provide high quality care. Input/output in the form of HL7 CDA Level 2 documents. Knowledge is encoded using Arden Syntax.

Downloads: 0 This Week

Last Update: 2013-04-17
See Project
20

Phoenix Information Extraction

Phoenix is an information extraction engine written in java. Controlled by rules (declared in xml), it extracts information form any XML document (unstructured XHTML/OpenOffice documents). Supports XPath, additional conditions and top-down decomposit

Downloads: 0 This Week

Last Update: 2013-03-14
See Project
21

Judge

JUDGE (Java Utility for Document Genre Eduction) features automatic classification and clustering of documents, optionally as a webservice. The program is written entirely in Java and makes use of the Weka machine learning toolkit.

Downloads: 0 This Week

Last Update: 2015-12-01
See Project
22

Expert Space

An Expert Search System for Enterprise Search based on the Information Retrieval Vector Space Model. Our model builds a weighted profile for each candidate and keeps all the documents, thus allowing to retrieve both people and documents together.

Downloads: 0 This Week

Last Update: 2013-03-21
See Project

Previous
You're on page 1
Next

Related Searches

annotation

tmx

war files

summarizer

xml merge

java text area

rdf

plagiarism linux

artificial intelligence projects in java

multiagent java

Related Categories

Scientific/Engineering

Artificial Intelligence

Education

Business

Communications

SourceForge

Create a Project
Open Source Software
Business Software
Top Downloaded Projects

Company

About
Team
SourceForge Headquarters
1320 Columbia Street Suite 310
San Diego, CA 92101
+1 (858) 422-6466

Resources

Support
Site Documentation
Site Status
SourceForge Reviews

© 2026 Slashdot Media. All Rights Reserved.

Terms Privacy Opt Out Advertise

×

Thanks for helping keep SourceForge clean.

X

You seem to have CSS turned off. Please don't fill out this field.

You seem to have CSS turned off. Please don't fill out this field.

Briefly describe the problem (required):

Upload screenshot of ad (required):

Select a file, or drag & drop file here.

✔

✘

Screenshot instructions:

Click URL instructions:
Right-click on the ad, choose "Copy Link", then paste here →
(This may not be possible with some types of ads)

More information about our ad policies

Ad destination/click URL: