documents free download

Paperless-ngx

A community-supported supercharged version of paperless

Paperless-ngx is a community-supported open-source document management system that transforms your physical documents into a searchable online archive so you can keep, well, less paper.

Downloads: 22 This Week

Last Update: 5 days ago

See Project

DocTR

Library for OCR-related tasks powered by Deep Learning

DocTR provides an easy and powerful way to extract valuable information from your documents. Seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents. Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters. User-friendly, 3 lines of code to load a document and extract text with a predictor.

Downloads: 3 This Week

Last Update: 2025-07-09

See Project

ktrain

ktrain is a Python library that makes deep learning AI more accessible

ktrain is a Python library that makes deep learning and AI more accessible and easier to apply. ktrain is a lightweight wrapper for the deep learning library TensorFlow Keras (and other libraries) to help build, train, and deploy neural networks and other machine learning models. Inspired by ML framework extensions like fastai and ludwig, ktrain is designed to make deep learning and AI more accessible and easier to apply for both newcomers and experienced practitioners. With only a few lines...

Downloads: 3 This Week

Last Update: 2024-06-19

See Project

GROBID

A machine learning software for extracting information

GROBID is a machine learning library for extracting, parsing, and re-structuring raw documents such as PDF into structured XML/TEI encoded documents with a particular focus on technical and scientific publications. First developments started in 2008 as a hobby. In 2011 the tool has been made available in open source. Work on GROBID has been steady as a side project since the beginning and is expected to continue as such.

Downloads: 4 This Week

Last Update: 2025-05-11

See Project

DeepKE

An Open Toolkit for Knowledge Graph Extraction and Construction

Supporting cnSchema, standard supervised setting, low-resource setting, document-level setting and multi-modal setting for knowledge base population. DeepKE is a knowledge extraction toolkit supporting cnSchema, standard supervised, low-resource, and document-level scenarios for entity, relation, and attribution extraction. It allows developers and researchers to customize datasets and models to extract information from unstructured texts. DeepKE supports low-resource settings with only a...

Downloads: 0 This Week

Last Update: 2023-09-21

See Project

Universal Data Tool

Collaborate & label any type of data, images, text, or documents etc.

An open-source tool and library for creating and labeling datasets of images, audio, text, documents and video in an open data format. The Universal Data Tool can be used by anyone on your team, no data or programming skills needed. Simplicity without sacrificing any powerful developer features and integrations. Use the Universal Data Tool directly from a web browser or with a Windows, Mac or Linux desktop application. Join a link to a collaborative session and see dataset samples from team members complete in real-time. ...

Downloads: 1 This Week

Last Update: 2022-08-11

See Project

GluonNLP

NLP made easy

GluonNLP is a toolkit that helps you solve NLP problems. It provides easy-to-use tools that helps you load the text data, process the text data, and train models. To facilitate both the engineers and researchers, we provide command-line-toolkits for downloading and processing the NLP datasets. Gluon NLP makes it easy to evaluate and train word embeddings. Here are examples to evaluate the pre-trained embeddings included in the Gluon NLP toolkit as well as example scripts for training...

Downloads: 0 This Week

Last Update: 2022-08-08

See Project

TEXT2DATA

Text Analytics Platform

Bring Text Analytics Platform that uses NLP (Natural Language Processing) and Machine Learning to your work environment. Extract essential information from your text documents and let Artificial Intelligence save your time. Get detailed and agile reports on your unstructured data.

Downloads: 1 This Week

Last Update: 2019-07-17

See Project

Arabic Corpus

Text categorization, arabic language processing, language modeling

The Arabic Corpus {compiled by Dr. Mourad Abbas ( http://sites.google.com/site/mouradabbas9/corpora ) The corpus Khaleej-2004 contains 5690 documents. It is divided to 4 topics (categories). The corpus Watan-2004 contains 20291 documents organized in 6 topics (categories). Researchers who use these two corpora would mention the two main references: (1) For Watan-2004 corpus ---------------------- M. Abbas, K. Smaili, D. Berkani, (2011) Evaluation of Topic Identification Methods on Arabic Corpora,JOURNAL OF DIGITAL INFORMATION MANAGEMENT,vol. 9, N. 5, pp.185-192. 2) For Khaleej-2004 corpus --------------------------------- M. ...

Downloads: 4 This Week

Last Update: 2019-03-05

See Project

AerinSistemas-Noname

Elasticsearch to Pandas dataframe or CSV

API and command line utility, written in Python, for querying Elasticsearch exporting result as documents into a CSV file. The search can be done using logical operators or ranges, in combination or alone. The output can be limited to the desired attributes. Also ToT can insert the querying to a Pandas Dataframe or/and save its in a HDF5 container (under development).

Downloads: 0 This Week

Last Update: 2018-08-07

See Project

OpenIMAJ

OpenIMAJ: The Open toolkit for Intelligent Multimedia Analysis in Java. OpenIMAJ contains a large collection of pure-Java classes for analysing multimedia documents, from tools for extracting image features, to tools for analysing web pages.

Downloads: 0 This Week

Last Update: 2015-03-18

See Project

Consilium Sentence Suggestions Tools

Consilium – User Defined sentence Suggestion Tool.

There are many tools available in market which will provide spell correction or grammer correction while making documents, but very few tools are available which are providing sentence completion according to previously entered text. But this all are providing sentence complition suggestion for sentences which are oftenly or regularly used by all people in same manner. But in reality style of writing changes person to person. While our aim is to provide a sentence suggestion tool which will give suggestion to complete the sentence according previously enterd data by the user. ...

Downloads: 0 This Week

Last Update: 2014-02-24

See Project

Text Analyzer Classifier Summarizer

TexLexAn is an open source text analyser for Linux, able to estimate the readability and reading time, to classify and summarize texts. It has some learning abilities and accepts html, doc, pdf, ppt, odt and txt documents. Written in C and Python.

Downloads: 0 This Week

Last Update: 2013-10-25

See Project

DocCO

Non-disjoint groupping of Documents based on word sequence approach

This is a GUI for learning non disjoint groups of documents based on Weka machine learning framework. It offers the possibility to make non disjoint clustering of documents using both vectorial and sequential representation (word sequence approach based on WSK kernel). All data format supported by WEKA could be used in DocCO. Data could be loaded from files, from databases or from specified URL.

Downloads: 0 This Week

Last Update: 2013-08-17

See Project

RapidMiner Information Extraction Plugin

The Information Extraction Plugin allows the use of information extraction techniques within RapidMiner. It can be seen as an interface between natural language and IE- or datamining-methods, by extracting interesting information out of documents.

Downloads: 0 This Week

Last Update: 2015-08-07

See Project

Leark

Leark is a Data Mining library developed in C#.NET. It contains several methods for ranking web documents described with a set of normalized features, and a feature selection algorithm. The methods are based on perceptron and clustering.

Downloads: 0 This Week

Last Update: 2013-04-19

See Project

Search Results for "documents"

Showing 16 open source projects for "documents"

Paperless-ngx

DocTR

ktrain

GROBID

DeepKE

Universal Data Tool

GluonNLP

TEXT2DATA

Arabic Corpus

AerinSistemas-Noname

OpenIMAJ

Consilium Sentence Suggestions Tools

Text Analyzer Classifier Summarizer

DocCO

RapidMiner Information Extraction Plugin

Leark

Search Results for "documents"

Showing 16 open source projects for "documents"

Paperless-ngx

DocTR

ktrain

GROBID

DeepKE

Universal Data Tool

GluonNLP

TEXT2DATA

Arabic Corpus

AerinSistemas-Noname

OpenIMAJ

Consilium Sentence Suggestions Tools

Text Analyzer Classifier Summarizer

DocCO

RapidMiner Information Extraction Plugin

Leark

Related Searches

Related Categories