Showing 169 open source projects for "documents"

View related business solutions
  • Outgrown Windows Task Scheduler? Icon
    Outgrown Windows Task Scheduler?

    Free diagnostic identifies where your workflow is breaking down—with instant analysis of your scheduling environment.

    Windows Task Scheduler wasn't built for complex, cross-platform automation. Get a free diagnostic that shows exactly where things are failing and provides remediation recommendations. Interactive HTML report delivered in minutes.
    Download Free Tool
  • Retool your internal operations Icon
    Retool your internal operations

    Generate secure, production-grade apps that connect to your business data. Not just prototypes, but tools your team can actually deploy.

    Build internal software that meets enterprise security standards without waiting on engineering resources. Retool connects to your databases, APIs, and data sources while maintaining the permissions and controls you need. Create custom dashboards, admin tools, and workflows from natural language prompts—all deployed in your cloud with security baked in. Stop duct-taping operations together, start building in Retool.
    Build an app in Retool
  • 1
    Finetune Transformer LM

    Finetune Transformer LM

    Code for "Improving Language Understanding by Generative Pre-Training"

    finetune-transformer-lm is a research codebase that accompanies the paper “Improving Language Understanding by Generative Pre-Training,” providing a minimal implementation focused on fine-tuning a transformer language model for evaluation tasks. The repository centers on reproducing the ROCStories Cloze Test result and includes a single-command training workflow to run the experiment end to end. It documents that runs are non-deterministic due to certain GPU operations and reports a median accuracy over multiple trials that is slightly below the single-run result in the paper, reflecting expected variance in practice. The project ships lightweight training, data, and analysis scripts, keeping the footprint small while making the experimental pipeline transparent. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 2

    AerinSistemas-Noname

    Elasticsearch to Pandas dataframe or CSV

    API and command line utility, written in Python, for querying Elasticsearch exporting result as documents into a CSV file. The search can be done using logical operators or ranges, in combination or alone. The output can be limited to the desired attributes. Also ToT can insert the querying to a Pandas Dataframe or/and save its in a HDF5 container (under development).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3

    Service Grid - Language Grid Base System

    SOA infrastracture initially developed by NICT Language Grid Project

    ...Resources with complicated intellectual property issues are wrapped as Web services and shared on the Service Grid. If you release your software by using the software of this project, please include the following description in the documents or on the website. * This software uses the [SOFTWARE] by the Language Grid project (http://langrid.org/). [SOFTWARE] is one of: * Service Grid Server Software (http://langrid.org/oss-project/en/service_grid.html) * Language Service Development Libraries (http://langrid.org/oss-project/en/language_service.html) * Language Grid Toolbox (http://langrid.org/oss-project/en/toolbox.html) If you publish a paper by using the software of this project, please cite the following book...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    OCR Web based

    OCR Web based

    OCR web based for Browser Firefox & PC

    Optical Character Recognition in JS for Browser is based on ocrad.js. OCR for Browser is a free extension and You can use this application to extract text from any image you supply. Just upload your image files. OCR for Browser takes either a JPG, GIF, TIFF, BMP, PNG. ========= Get OCR for Android (Beta release) - https://play.google.com/store/apps/details?id=com.ulm.ocr ========= Add-on for Opera: http://bit.ly/1F0E0wP ========= Release 1.0.1 For safety reasons, I disabled...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Atera all-in-one platform IT management software with AI agents Icon
    Atera all-in-one platform IT management software with AI agents

    Ideal for internal IT departments or managed service providers (MSPs)

    Atera’s AI agents don’t just assist, they act. From detection to resolution, they handle incidents and requests instantly, taking your IT management from automated to autonomous.
    Learn More
  • 5

    BioC

    We describe a simple XML format to share text documents and annotation

    A minimalist approach to share text documents and data annotations. Allows a large number of different annotations to be represented. Project files contain: - simple code to hold/read/write data and perform sample processing. - BioC-formatted corpora - BioC tools that work with BioC corpora BioC goals - simplicity - interoperability - broad use - reuse There should be little investment required to learn to use a format or a software module to process that format. ...
    Leader badge
    Downloads: 1 This Week
    Last Update:
    See Project
  • 6
    This paper represent a development and deployment and/or Implementation of Optical Character Recognition (OCR) to translate images of typewritten or handwritten characters into electronically editable format by preserving font properties. OCR can do this by applying pattern matching algorithm. The Recognized characters are stored in editable format. Thus OCR make the computer read the printed documents discarding noise. Keywords- Optical Character Recognition, Image convert to character, Image cropping.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    DoAllWithPDF_servicemenu

    DoAllWithPDF_servicemenu

    KDE servicemenu for pdf

    allows kde user to make a lot of things whit right click on a pdf file.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8

    WebDjVuTextEd

    Edit the OCR text layer of DjVu documents in a web browser

    WebDjVuTextEd allows to edit the text layer of OCR'ed DjVu documents in a web browser. You can modify the structure (paragraphs, lines, words...) create, delete, edit text nodes, modify their container box by mouse, and run a spellchecker. The program does not directly read the DjVu files, it requires exported XML text data and images. When using without a webserver, you can open and save local files, but cannot take advantages of auto-save and spell checking.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9

    "MedicalRecords"

    MedicalRecords is an integrated medical information system.

    ...Data that are downloadable in machine readable format can be transferred electronically to the database. Alternately, the data can be transferred from USB flash drives, CD ROMs or other removable storage media. Documents can be entered by scanning to PDF files or other formats. Finally, information may be entered through use of speech recognition or typing. “MedicalRecords” gives one or more patients access to an integrated medical record the data in which may come from a variety of sources. It also provides an easy means for presenting the integrated data to specialist or other new care provider, emergency room staff or admitting physicians.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Free and Open Source HR Software Icon
    Free and Open Source HR Software

    OrangeHRM provides a world-class HRIS experience and offers everything you and your team need to be that HR hero you know that you are.

    Give your HR team the tools they need to streamline administrative tasks, support employees, and make informed decisions with the OrangeHRM free and open source HR software.
    Learn More
  • 10
    This project will produce a set of machine measures of text document similarity. A measure of document similarity quantifies the degree to which two text documents are related.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Carrot2
    Project moved to GitHub! https://github.com/carrot2/carrot2 Carrot2 is an Open Source Search Results Clustering Engine. It can automatically organize small collections of documents, e.g. search results, into thematic categories. Carrot2 integrates very well with both Open Source and proprietary search engines.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    DJVU++

    DJVU++

    The DjVu complete solution,with OCR Technology(Arabic ,English).

    ...The main features of DjVu++ program are: o Manipulate DjVu files. o Support smaller size than PDF with the same performance. o DjVu++ supports two languages in the OCR technique (Arabic and English). o Read multiple documents at the same time with the new tabs feature. o DjVu++ supports multiple formats:  Convert PDF document into DjVu format with smaller file size and the same performance.  Convert DjVu into PDF format.  Combine images to a single DjVu document. Perform OCR operations on multiple image formats.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 13
    DjVuPlus

    DjVuPlus

    DjVu Read Documents,With OCR Technology(Arabic ,English ),Small Size

    The DjVu Reference Library 3.5 was released by Lizardtech under the GNU General Public License version 2. DjVuLibre-3.5 was developed by Leon Bottou and others as a "Derived Work" of the DjVu Reference Library 3.5. As such, it is also subject to the GNU General Public License version 2. Several patents apply to two very specific aspects of DjVu and DjVuLibre. The patents cover a particular aspect of the ZP-coder (the arithmetic coder used in DjVu and implemented in libdjvu/ZPCodec.cpp)...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14

    Natural Language Analysis with Ngrams

    NLP tool for statistical analysis of words, sentences, documents

    ...In the future versions, user will be able to convert a single word to numerical data, to be able to compare two words and get the comparison data, and to be able to do the same for the sentences, paragraphs and documents. I will JAR-it once I decide that it can be called a final release. This project was made by creating a corpus from the Google Ngrams data for English Language, version 20120701. EOWL list of English words was used to filter-out the words from Ngrams data. For each year, per word, the data was added and calculated to describe the average appearance of a word per document for a given year. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Scorpius Robocup soccer simulation team. we will release our binaries, base code, trainers and useful documents here.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    FALCON - Text Search Java Project

    FALCON - Text Search Java Project

    JSON based text search Java Project

    ----------------- - What is it? - ----------------- The "Falcon Search" is a JAVA API and tool to search inside the documents. It was originally started to search the content in pdf files under the project "HAWK Search". Searching with this tool is query-based not word-based as in most of the document search tools OR document readers. It also takes care of jumbling of words within query and spelling mistakes. Commonly used techniques in this project are Natural Language Processing, Information Extraction and Question-Answering Architecture. ---------------------- - Latest Version - ---------------------- Details of latest version can be found on project website - http://geekdadaji.com --------------------------- - CONTACT DETAILS - --------------------------- CREATOR : SWAPNIL A JADHAV (saj1919) EMAIL ID : dadajibudhau@gmail.com WEBSITE : http://geekdadaji.com LICENSE : CC BY-NC 4.0
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Consilium Sentence Suggestions Tools

    Consilium Sentence Suggestions Tools

    Consilium – User Defined sentence Suggestion Tool.

    There are many tools available in market which will provide spell correction or grammer correction while making documents, but very few tools are available which are providing sentence completion according to previously entered text. But this all are providing sentence complition suggestion for sentences which are oftenly or regularly used by all people in same manner. But in reality style of writing changes person to person. While our aim is to provide a sentence suggestion tool which will give suggestion to complete the sentence according previously enterd data by the user. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Unsupervised TXT classifier

    Unsupervised TXT classifier

    Classify any two TXT documents, no training required - JAVA

    ...The summarizer from Classifier4J has been adjusted to accept two inputs (lets call them A and B). Then, the summarizer gets trained with A to summarize a document B, and vice versa. This extracts a relevant structure for both documents (and thus avoids the over-training) which are then compared using the Vector-Space analysis to give a range of belonging of one document to another (and thus avoids the shortage of information). This method can be used to create the user-defined classes by merging texts of certain categories and then to calculate the relevant distances between the documents, but this is not necessary.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19

    Medical Treebank

    Community-based linguistic annotation work on clinical documents.

    This project hosts linguistic annotations and guidelines for clinical text. We plan to include several types of annotation (Token, POS and Parse) in WordFreak format on clinical notes originally from the i2b2/VA NLP challenges. The guidelines are copyrighted, but free for the community to use. Annotation in WordFreak format contains only linguistic labels and character offsets, and can be distributed independently from the note text. Instruction is provided on setting up WordFreak for...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 20
    TexLexAn is an open source text analyser for Linux, able to estimate the readability and reading time, to classify and summarize texts. It has some learning abilities and accepts html, doc, pdf, ppt, odt and txt documents. Written in C and Python.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 21
    DocCO

    DocCO

    Non-disjoint groupping of Documents based on word sequence approach

    This is a GUI for learning non disjoint groups of documents based on Weka machine learning framework. It offers the possibility to make non disjoint clustering of documents using both vectorial and sequential representation (word sequence approach based on WSK kernel). All data format supported by WEKA could be used in DocCO. Data could be loaded from files, from databases or from specified URL.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22

    dbacl - digramic Bayesian classifier

    commandline multiclass email and text filter

    dbacl is a general purpose digramic Bayesian text classifier. It can learn text documents you provide, and then compare new input with the learned categories. It can be used for spam filtering, or within your own shell scripts. Sometimes it plays che
    Leader badge
    Downloads: 1 This Week
    Last Update:
    See Project
  • 23
    The Information Extraction Plugin allows the use of information extraction techniques within RapidMiner. It can be seen as an interface between natural language and IE- or datamining-methods, by extracting interesting information out of documents.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    opendias
    ...Please visit the homepage link for the most update to date information, support and files. Document Imaging Archive System. Home document imaging, with OCR. Scan documents (with SANE) or import ODF documents, assign tags. Use openDIAS to store all our letters, bills, statements, etc in a convenient, safe and easily retrievable way.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Redundancy due to cut-paste operations in text creates bias in machine learning for NLP. This module takes a directory and produces a subset of the files in that directory (in a list) with an upper bound on similarity between two files.
    Downloads: 0 This Week
    Last Update:
    See Project