document analysis free download

Showing 20 open source projects for "document analysis"

View related business solutions

Scientific/Engineering Java Clear Filters & Widen Search

Our Free Plans just got better! | Auth0 by Okta
With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your secuirty. Auth0 now, thank yourself later.

Try free now
Bright Data - All in One Platform for Proxies and Web Scraping
Say goodbye to blocks, restrictions, and CAPTCHAs

Bright Data offers the highest quality proxies with automated session management, IP rotation, and advanced web unlocking technology. Enjoy reliable, fast performance with easy integration, a user-friendly dashboard, and enterprise-grade scaling. Powered by ethically-sourced residential IPs for seamless web scraping.

Get Started
1

DynaQ

Innovative text document search. http://dynaq.opendfki.de for details.

The goal of DynaQ is to develop an inquiry system to explore the personal information space, supporting you with the searching paradigm 'orienteering'. DynaQ is a (desktop)search engine with enhanced functionality for file, email and blog search. Look at our GitLab homepage for sourcecode and documentation: http://dynaq.opendfki.de

Downloads: 0 This Week

Last Update: 2021-08-05
See Project
2

jLDADMM

A Java package for the LDA and DMM topic models

The Java package jLDADMM is released to provide alternative choices for topic modeling on normal or short texts. It provides implementations of the Latent Dirichlet Allocation topic model and the one-topic-per-document Dirichlet Multinomial Mixture model (i.e. mixture of unigrams), using collapsed Gibbs sampling. In addition, jLDADMM supplies a document clustering evaluation to compare topic models. See the usage of jLDADMM in its website at http://jldadmm.sourceforge.net/

1 Review

Downloads: 0 This Week

Last Update: 2016-03-13
See Project
3

DCTFinder

Extract title and creation time from web page.

Web pages do not offer reliable metadata concerning their creation date and time. However, getting the document creation time is a necessary step for allowing to apply temporal normalization systems to web pages. DCTFinder is a system that parses a web page and extracts from its content the title and the creation date of this web page. DCTFinder combines heuristic title detection, supervised learning with Conditional Random Fields (CRFs) for document date extraction, and rule-based creation...

Downloads: 0 This Week

Last Update: 2016-10-21
See Project
4

SCAN

SCAN (Smart Content Aggregation and Navigation) is a universal semantic content aggregator. It combines search, text analysis, tagging and metadata functions to provide new user experience of desktop navigation and document management.

3 Reviews

Downloads: 0 This Week

Last Update: 2014-06-19
See Project
Secure remote access solution to your private network, in the cloud or on-prem.
Deliver secure remote access with OpenVPN.

OpenVPN is here to bring simple, flexible, and cost-effective secure remote access to companies of all sizes, regardless of where their resources are located.

Get started — no credit card required.
5

FALCON - Text Search Java Project

JSON based text search Java Project

----------------- - What is it? - ----------------- The "Falcon Search" is a JAVA API and tool to search inside the documents. It was originally started to search the content in pdf files under the project "HAWK Search". Searching with this tool is query-based not word-based as in most of the document search tools OR document readers. It also takes care of jumbling of words within query and spelling mistakes. Commonly used techniques in this project are Natural Language...

Downloads: 3 This Week

Last Update: 2014-04-18
See Project
6

Texalyzer

Text analyzer

Analyzes text document using TF-IDF and optionally stopword list, and extracts important keywords.

Downloads: 0 This Week

Last Update: 2017-04-04
See Project
7

Unsupervised TXT classifier

Classify any two TXT documents, no training required - JAVA

... trained with A to summarize a document B, and vice versa. This extracts a relevant structure for both documents (and thus avoids the over-training) which are then compared using the Vector-Space analysis to give a range of belonging of one document to another (and thus avoids the shortage of information). This method can be used to create the user-defined classes by merging texts of certain categories and then to calculate the relevant distances between the documents, but this is not necessary.

Downloads: 0 This Week

Last Update: 2013-12-19
See Project
8

Large Document Search Engine

A system to perform analysis of large documents for the purpose of cataloging similar documents. Similarity is based upon contextual analysis of these documents done by identifying common words and proper nouns.

Downloads: 0 This Week

Last Update: 2016-11-02
See Project
9

XmlView

GUI utility in pure Java for viewing and editing XML content; example of application built with Superficial http://superficial.sourceforge.net

Downloads: 0 This Week

Last Update: 2012-05-22
See Project
Save hundreds of developer hours with components built for SaaS applications.
The #1 Embedded Analytics Solution for SaaS Teams.

Whether you want full self-service analytics or simpler multi-tenant security, Qrvey’s embeddable components and scalable data management remove the guess work.

Try Developer Playground
10

ILEDocs

ILEDocs is a documentation tool which helps the software developers to document their programs in a convenient way similar to javadoc.

Downloads: 0 This Week

Last Update: 2012-09-14
See Project
11

OpenSHORE

OpenSHORE is an XML based Semantic Document Repository (SDR) with a free definable meta model that builds up a semantic network from sections and relations in documents. The acronym SHORE means Semantic Hypertext Object Repository.

Downloads: 0 This Week

Last Update: 2013-04-15
See Project
12

Maui Topic Indexer

Maui is a multi-purpose automatic topic indexing algorithm. Given a document, Maui automatically identifies its topics. Depending on the task topics are tags, keywords, keyphrases, vocabulary terms, descriptors or Wikipedia titles.

Downloads: 0 This Week

Last Update: 2014-04-25
See Project
13

RDF Document Manager

RDF-DocMan is a document manager based on a Sesame (RDF repository) backend. Documents are stored in the filesystem and their metadata in a Sesame repository. It was developed for porQual web content generator (also in sf.net).

Downloads: 0 This Week

Last Update: 2013-04-23
See Project
14

Trainable Relation Extraction framework

T-Rex (Trainable Relation Extraction) is a highly configurable machine learning-based Information Extraction from Text framework, which includes tools for document classification, entity extraction and relation extraction.

Downloads: 0 This Week

Last Update: 2013-05-02
See Project
15

iDocs

iDocs is a intellectual document work flow with text mining options project.

Downloads: 0 This Week

Last Update: 2014-04-08
See Project
16

Flesh

Flesh is a Java application designed to analyze a document (plain text, rich text, Word documents, and PDFs) and display the difficulty associated with comprehending using the Flesch-Kincaid Grade Level and the Flesch Reading Ease Score.

2 Reviews

Downloads: 14 This Week

Last Update: 2013-04-03
See Project
17

Qualiweb

Qualiweb aims at providing semantic web metrics for modeling a website visitors needs according to a given taxonomy or document classification. Web metrics provided by Qualiweb give an indication of how successful each of the website topics have been.

Downloads: 3 This Week

Last Update: 2013-03-19
See Project
18

vyasa

vyasa is a digital library application that incorporates the functions of digital asset and document management systems. It facilitates information retrieval and knowledge discovery by providing comprehensive metadata generation and semantic analysis.

Downloads: 0 This Week

Last Update: 2013-04-24
See Project
19

Phoenix Information Extraction

Phoenix is an information extraction engine written in java. Controlled by rules (declared in xml), it extracts information form any XML document (unstructured XHTML/OpenOffice documents). Supports XPath, additional conditions and top-down decomposit

Downloads: 1 This Week

Last Update: 2013-03-14
See Project
20

Judge

JUDGE (Java Utility for Document Genre Eduction) features automatic classification and clustering of documents, optionally as a webservice. The program is written entirely in Java and makes use of the Weka machine learning toolkit.

Downloads: 0 This Week

Last Update: 2015-12-01
See Project