Showing 19 open source projects for "corpus"

View related business solutions
  • $300 Free Credits for Your Google Cloud Projects Icon
    $300 Free Credits for Your Google Cloud Projects

    Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

    Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.
    Start Free Trial
  • Atera - an All-in-one platform for IT management Icon
    Atera - an All-in-one platform for IT management

    Ideal for IT departments and MSPs (managed service providers)

    Your IT essentials, integrated & elevated. Take your IT management from automated to autonomous, download Atera's agent to start your free trial!
    Try Atera now
  • 1
    iramuteq
    IRAMUTEQ : Interface de R pour les Analyses Multidimensionnelles de Textes et de Questionnaires. Logiciel de traitement de données pour des corpus texte ou de type individus/caractères. Permet notamment de réaliser des analyses de type "ALCESTE"
    Leader badge
    Downloads: 637 This Week
    Last Update:
    See Project
  • 2
    This is an application generator for conflation algorithms in perl language. This system supports generation perl source code for a stemmer from a rule file, running a stemmer which is supported by the system, parsing a corpus file.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Korean Analyzer Rhino

    Korean Analyzer Rhino

    Parsing Korean words by morpheme and part-of-speech

    RHINO parses Korean words by morpheme and part-of-speech. Its dictionaries are based on Korean Modern Tagged Corpus(12 million phrases scale) which was made by Korean government. So it analyses many cases of stems and endings. And the newly developed Dynamic Dictionary Technology can make words to react with their context. That is, a programmed database. For more information see the files in the help folder.
    Leader badge
    Downloads: 2 This Week
    Last Update:
    See Project
  • 4
    concordia

    concordia

    Powerful search library, best suited for computer-aided translation

    ...Concordance searcher - tool for translators who need their translations to "agree" with one standard. Concordia is a C++ library for fast text lookup in large corpora. It uses a RAM stored index, which takes up approximately 600MB of memory for a corpus of 2 million sentences. It is based on the idea of a suffix array, enhanced by the presence of other auxiliary data structures. The effects are stunning - Concordia is able to do simple substring lookup at the pace of 5000 queries per second (on personal PC) - a speed which can not be achieved by any other search library. Moreover, Concordia can perform its own "concordia search". ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 5
    **CODE MOVED TO GITHUB: https://github.com/bitextor ** Bitextor is an application created to generate translation memories using multilingual websites as a corpus source. It downloads an entire website and applies a set of heuristics (based mainly on HTML tag structure and text block length) to find bitexts.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Yet another corpus manager. Allows for HTTP access to annotated text corpora, client does not need to install any special software to access the server (any browser with JavaScript support will do).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    CRFSharp

    CRFSharp

    CRFSharp is a .NET(C#) implementation of Conditional Random Field

    ...It encodes model parameters by L-BFGS. Moreover, it has many significant improvement than CRF++, such as totally parallel encoding, optimizing memory usage and so on. Currently, when training corpus, compared with CRF++, CRF# can make full use of multi-core CPUs and only uses very low memory, and memory grow is very smoothly and slowly while amount of training corpus, tags increase. with multi-threads process, CRF# is more suitable for large data and tags training than CRF++ now. For example, in machine with 64GB, CRF# encodes model with more than 4.5 hundred million features quickly.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    An open-source corpus analysis class library written in C#. GUI of Tenka Text 0.1.3 comes with Wordlister - an advanced, extremely fast graphical wordlist tool and a simple regex concordance tool. Tenka Text - the open-source answer to WordSmith Tool
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9
    A toolkit with using Suffix Array indexing for empirical natural language processing. Providing functions such as searching the occurrences of n-grams in the corpus and suffix array language model which can use arbitrarily long history.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 10
    This proyect presents a system, which, from a corpus of documents, extracts information about a theme area, and a pedagogical components collection. This information is packed into fine granularity learning objects (metadata included).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Get1T is a tool for filtering through the massive quantity of data available in the Web 1T corpus and extracting only the counts you need - including for simple wildcard patterns.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    This project is supposed to list the Top R ranked terms that are of between M and N length. It is designed to extract these phrases from a given corpus in a input folder.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    cl-cc-bnc provides a frontend to learners of English language. You can enter an URI, which will be analyzed word-frequency-wise and compared to word frequencies in the British National Corpus.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    BabyTALK is to add another brick in the wall of natural languages learning. The baby needs to structure a corpus of texts when his tutor points and talks about a particular part of the corpus. The baby is also to describe any selected part of the corpus.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    CRFChunker: Conditional Random Fields Phrase Chunker (Phrase Chunking Tool) for English. The model was trained on sections 01..24 of WSJ corpus and using section 00 as the development test set (F1-score of 95.77). Chunking speed: 700 sentences/s
    Downloads: 1 This Week
    Last Update:
    See Project
  • 16
    CRFTagger: Conditional Random Fields Part-of-Speech (POS) Tagger for English. The model was trained on sections 01..24 of WSJ corpus and using section 00 as the development test set (accuracy of 97.00%). Tagging speed: 500 sentences/s.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    AmiGram is the AMI Graphical Representation and Annotation Module. It is a general-purpose tool for multimodal corpus annotation and allows the time line based annoation of NXT corpora in a layer based environment.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Italian labeled digits corpus, good for speech recognition. Corpus di cifre italiane segmentato, adatto a esperimenti di riconoscimento vocale e riconoscimento fonetico.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    TagHybrida is a French hybrid syntactic parser. TagHybrida is a four stage parser combining hand-writen and corpus based information.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
Auth0 Logo