Showing 77 open source projects for "corpora"

View related business solutions
  • Securden Privileged Account Manager Icon
    Securden Privileged Account Manager

    Unified Privileged Access Management

    Discover and manage administrator, service, and web app passwords, keys, and identities. Automate management with approval workflows. Centrally control, audit, monitor, and record all access to critical IT assets.
  • Cyber Risk Assessment and Management Platform Icon
    Cyber Risk Assessment and Management Platform

    ConnectWise Identify is a powerful cybersecurity risk assessment platform offering strategic cybersecurity assessments and recommendations.

    When it comes to cybersecurity, what your clients don’t know can really hurt them. And believe it or not, keep them safe starts with asking questions. With ConnectWise Identify Assessment, get access to risk assessment backed by the NIST Cybersecurity Framework to uncover risks across your client’s entire business, not just their networks. With a clearly defined, easy-to-read risk report in hand, you can start having meaningful security conversations that can get you on the path of keeping your clients protected from every angle. Choose from two assessment levels to cover every client’s need, from the Essentials to cover the basics to our Comprehensive Assessment to dive deeper to uncover additional risks. Our intuitive heat map shows you your client’s overall risk level and priority to address risks based on probability and financial impact. Each report includes remediation recommendations to help you create a revenue-generating action plan.
  • 1
    gensim

    gensim

    Topic Modelling for Humans

    Gensim is a Python library for topic modeling, document indexing, and similarity retrieval with large corpora. The target audience is the natural language processing (NLP) and information retrieval (IR) community.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    IMS Open Corpus Workbench

    IMS Open Corpus Workbench

    Indexing and query tools for very large text corpora

    The IMS Open Corpus Workbench is a collection of tools for managing and querying large text corpora (100 M words and more) with linguistic annotations. Its central component is the flexible and efficient query processor CQP, which can be used interactively in a terminal session, as a backend e.g. from a Perl script, or through the Web-based GUI CQPweb.
    Leader badge
    Downloads: 40 This Week
    Last Update:
    See Project
  • 3
    TXM

    TXM

    Unicode-XML-TEI text/corpus analysis platform

    TXM is a free and open-source cross-platform Unicode & XML based text/corpus analysis environment and graphical client, supporting Windows, Linux and Mac OS X. It can also be used online as a J2EE standard compliant web portal (GWT based) with access control built in. DOWNLOAD LATEST VERSION OF TXM : http://textometrie.ens-lyon.fr/spip.php?rubrique61&lang=en TXM offers a comprehensive range of analysis tools (concordances, collocate search, frequency lists, etc.) based on the powerfull...
    Downloads: 15 This Week
    Last Update:
    See Project
  • 4

    Tokenized Text Aligner

    Aligns tokens in two versions of a text with differing tokenization.

    This tool performs token-by-token alignment of two versions of a text with differing tokenization by interpreting the results of a file diff (https://docs.python.org/3/library/difflib.html). It is intended for use in the preparation of annotated linguistic corpora, where differences in tokenization may arise (i) following corrections or modifications to the source text or (ii) through the creation of different layers of annotation (part-of-speech, treebank) requiring different tokenization...
    Downloads: 0 This Week
    Last Update:
    See Project
  • RMM Software | Remote Monitoring Platform and Tools Icon
    RMM Software | Remote Monitoring Platform and Tools

    Best-in-class automation, scalability, and single-pane IT management.

    Don’t settle when it comes to managing your clients’ IT infrastructure. Exceed their expectations with ConnectWise RMM, our MSP RMM software that provides proactive tools and NOC services—regardless of device environment. With the number of new vulnerabilities rising each year, smart patching procedures have never been more important. We automatically test and deploy patches when they are viable and restrict patches that are harmful. Get better protection for clients while you spend less time managing endpoints and more time growing your business. It’s tough to locate, afford, and retain quality talent. In fact, 81% of IT leaders say it’s hard to find the recruits they need. Add ConnectWise RMM, NOC services and get the expertise and problem resolution you need to become the advisor your clients demand—without adding headcount.
  • 5
    JoBimText

    JoBimText

    Linking Language to Knowledge with Distributional Semantics

    JobimText is a software solution for automatic text expansion using contextualized distributional similarity. It provides text analysis tools for large corpora and has capabilities to create distributional semantic models (JoBimText models) and multi-word expressions.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6

    Linguistic Analyzer

    The Linguistic Analyzer is a tool for corpus analysis and comparison

    The Linguistic Analyzer (Almuhalil Alloghawy) is a free tool designed by a team from Al-Imam Muhammad bin Saud islamic university that can be used for corpus analysis and comparison in terms of the several linguistic characteristics, such as frequency lists generation, concordances, collocation extraction, the difference between two words, and keyword identification.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    43 queries of various topics for the Information Retrieval Collection . The corpus is created from the OSAC corpus of journalistic texts consisting of 4763 articles recovered from the Arabic BBC News. https://sourceforge.net/projects/ar-text-mining/files/Arabic-Corpora/
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    PyTorch SimCLR

    PyTorch SimCLR

    PyTorch implementation of SimCLR: A Simple Framework

    For quite some time now, we know about the benefits of transfer learning in Computer Vision (CV) applications. Nowadays, pre-trained Deep Convolution Neural Networks (DCNNs) are the first go-to pre-solutions to learn a new task. These large models are trained on huge supervised corpora, like the ImageNet. And most important, their features are known to adapt well to new problems. This is particularly interesting when annotated training data is scarce. In situations like this, we take the models...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9
    NLP Best Practices

    NLP Best Practices

    Natural Language Processing Best Practices & Examples

    In recent years, natural language processing (NLP) has seen quick growth in quality and usability, and this has helped to drive business adoption of artificial intelligence (AI) solutions. In the last few years, researchers have been applying newer deep learning methods to NLP. Data scientists started moving from traditional methods to state-of-the-art (SOTA) deep neural network (DNN) algorithms which use language models pretrained on large text corpora. This repository contains examples...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Visitor Management and Staff Sign In | Sign In App Icon
    Visitor Management and Staff Sign In | Sign In App

    Sign In App is a modern, enjoyable way to sign in visitors and staff, and book desks and meeting rooms.

    Our visitor management system streamlines registration, check-in, and authorization processes, while our facility management tools streamline room booking, resource allocation, and asset management. We prioritize security with our advanced risk mitigation measures, including health and safety protocols, emergency messaging, and robust analytics for thorough auditing.
  • 10

    POWLA

    OWL/RDF representation for linguistic corpora

    POWLA is a formalism that allows to represent linguistic corpora in RDF. POWLA is an OWL/DL formalization of an abstract data model, PAULA (http://www.sfb632.uni-potsdam.de/d1/paula/doc), that has been developed to represent (a) any type of linguistic annotation applicable to textual data, and (b) any combination of annotation layers. For a detailed motivation of POWLA and its application to facilitate interoperability of annotated corpora, see Christian Chiarcos (to appear 2012...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Word frequency and diversity (distribution) across hundreds of corpora. You'll see both the lemma and the various forms.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12

    Arabic Rare Words Project

    Text Analysis Egyptian Schoolbooks

    The purpose is to compare the most common words in the language with the words used in textbooks for students in Egyptian schools. The frequency can help scholars and teachers better teach reading.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13

    OLiA

    OWL/DL ontologies for linguistic annotations

    .../) and ISOcat (http://www.isocat.org) The OLiA ontologies were originally developed as part of an infrastructure for the sustainable maintenance of linguistic resources (http://www.sfb441.uni-tuebingen.de/c2/index-engl.html), their fields of application include the formalization of annotation schemes, concept-based querying over heterogeneously annotated corpora, and the development of interoperable NLP pipelines.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 14
    @Note2

    @Note2

    @Note2 - A workbench for Biomedical Text Mining

    Biomedical Text Mining (BioTM) is providing valuable approaches to the automated curation of scientific literature.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15

    Arabic Corpus

    Text categorization, arabic language processing, language modeling

    The Arabic Corpus {compiled by Dr. Mourad Abbas ( http://sites.google.com/site/mouradabbas9/corpora ) The corpus Khaleej-2004 contains 5690 documents. It is divided to 4 topics (categories). The corpus Watan-2004 contains 20291 documents organized in 6 topics (categories). Researchers who use these two corpora would mention the two main references: (1) For Watan-2004 corpus ---------------------- M. Abbas, K. Smaili, D. Berkani, (2011) Evaluation of Topic Identification Methods...
    Leader badge
    Downloads: 5 This Week
    Last Update:
    See Project
  • 16
    concordia

    concordia

    Powerful search library, best suited for computer-aided translation

    Concordia - Roman goddess of agreement. Concordance searcher - tool for translators who need their translations to "agree" with one standard. Concordia is a C++ library for fast text lookup in large corpora. It uses a RAM stored index, which takes up approximately 600MB of memory for a corpus of 2 million sentences. It is based on the idea of a suffix array, enhanced by the presence of other auxiliary data structures. The effects are stunning - Concordia is able to do simple substring...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17

    Queries for OSAC (Arabic) Corpus

    43 Queries for Arabic Information Retrieval Collection

    43 queries of various topics for the Information Retrieval Collection . The corpus is created from the OSAC corpus of journalistic texts consisting of 4763 articles recovered from the Arabic BBC News. https://sourceforge.net/projects/ar-text-mining/files/Arabic-Corpora/
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18

    Ghawwas_V4

    An open source system for Arabic corpora processing

    Ghawwas (previously known as Khawas) is an open source system for Arabic corpora processing. Ghawwas V4.0 provides the following main functions: a. Frequency list for single word and N-Grams b. Concordance c. Collocation (MI, CHI Squared, LL, T-Score, Z Score, Dice, Log Dice, Weirdness Coefficient) d. Lexical patterns search e. Two corpora frequency profile comparison based on MI, CHI, LL, T-Score, Z Score, Dice, Log Dice, Weirdness Coefficient f. Accept Windows and UTF-8 character...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 19

    HipparchiaServer

    front end to Hipparchia corpora: searching, browsing, concordances, texts, dictionaries, parsing

    Downloads: 0 This Week
    Last Update:
    See Project
  • 20

    rcqp

    R interface to the Corpus Query Protocol

    Implements the Corpus Query Protocol as a package for the R statistical environment. It allows to query linguistic corpora and manipulate the data as native R objects. It is based on the CWB software.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    BioNLP-Corpora is a repository of biomedically and linguistically annotated corpora and biomedical data sources. There are many resources available in separate packages in this project.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 22
    The Corpora contains 81,000 tagged words of Arabic resources (Contemporary Arabic (CCA) [1] and Arabic Wikipedia [2]) text with the basic tags (verb, noun, adjective). [1] http://www.comp.leeds.ac.uk/eric/latifa/research.htm. [2] http://ar.wikipedia.org.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Scattertext 0.2.1

    Scattertext 0.2.1

    Beautiful visualizations of how language differs among document types

    A tool for finding distinguishing terms in corpora and displaying them in an interactive HTML scatter plot. Points corresponding to terms are selectively labeled so that they don't overlap with other labels or points.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Yet another corpus manager. Allows for HTTP access to annotated text corpora, client does not need to install any special software to access the server (any browser with JavaScript support will do).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25

    Arabic business corpora

    Arabic business and management corpus

    This corpora is made up of 3 sub corpora as follows: 1) Management Corpus: 400 articles by Chairmans and CEOs of Arabic companies in the Middle East. 2) Economics News: 400 news articles from different Arabic online newspapers. 3) Stock market news, 400 articles collected from investing.com. The main corpora contains 1200 articles. The articles have been tagged using Stanford Arabic Part of Speech Tagger. Both plain text and tagged corpora are available to download, check the Files...
    Downloads: 4 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • Next