Showing 25 open source projects for "corpus"

View related business solutions
  • Build Agents and Models on One Platform Icon
    Build Agents and Models on One Platform

    Everything you need to build production-ready agents and models. Access 200+ Google and third-party AI models and tools.

    Gemini Enterprise Agent Platform is Google Cloud's comprehensive platform for developers to build, scale, govern, and optimize agents and models. Choose from Google's most advanced models and third-party models like Anthropic's Claude Model Family.
    Try It Free
  • Build Securely on AWS with Proven Frameworks Icon
    Build Securely on AWS with Proven Frameworks

    Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

    Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.
    Download Now
  • 1

    modnlp-plugins

    External plugins for modnlp/teccli

    This is a general project for modnlp/teccli plugins, with focus on text visualizaton.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Software, information, data sets and documentation for the Web as Corpus community.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3

    korpus

    Corpus Linguistics Software

    Some software for Corpus Linguistics, which includes Corpus Text Editor, Web-based search, etc. This project created for Belarusian Corpus, but can be used for other languages with some adaption.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Korean Analyzer Rhino

    Korean Analyzer Rhino

    Parsing Korean words by morpheme and part-of-speech

    RHINO parses Korean words by morpheme and part-of-speech. Its dictionaries are based on Korean Modern Tagged Corpus(12 million phrases scale) which was made by Korean government. So it analyses many cases of stems and endings. And the newly developed Dynamic Dictionary Technology can make words to react with their context. That is, a programmed database. For more information see the files in the help folder.
    Leader badge
    Downloads: 2 This Week
    Last Update:
    See Project
  • $300 Free Credits for Your Google Cloud Projects Icon
    $300 Free Credits for Your Google Cloud Projects

    Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

    Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.
    Start Free Trial
  • 5

    SimpleLemmatizer

    This program is for text lemmatization

    It lemmatizes texts based on supplied model. The base model is for slovak texts and is created from Slovak National Corpus, copyright by Ľ. Štúr Institute of Linguistics, Slovak Academy of Sciences
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Corpus Toolkit

    Corpus Toolkit

    A text management tool for linguistic purposes...

    Downloads: 0 This Week
    Last Update:
    See Project
  • 7

    Drug Extraction

    Drug name extraction

    ...Using CONLL-Evaluation: processed 32065 tokens with 3656 phrases; found: 3251 phrases; correct: 2786. accuracy: 95.25%; precision: 85.70%; recall: 76.20%; FB1: 80.67 Using GATE Corpus Benchmark: Strict: P: 0.65 R: 0.73 F1: 0.69 Lenient: P: 0.74 R: 0.84 F1: 0.78 The details of how to reproduce evaluation, see README. To use standalone version for tagging download DrugExtractionStandalone.tar.gz from Files.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Platform for Annotated Corpora in XML Integrated tool for corpus linguists built on Eclipse, Vex, Subversive, etc. for creating and editing transcriptions and annotations, querying, managing version controlled data, and building a shippable corpus.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    TF-IDF.jar is a Java Archive file to measure TF-IDF of each document in a document collection (corpus). The jar can be used to (a) get all the terms in the corpus (b) get the document frequency (DF) and inverse document frequency (IDF) of all the terms in the corpus (c) get the TF-IDF of each document in the corpus (d) get each term with their frequency (no. of presence), term frequency (TF) and TF-IDF in every document
    Downloads: 0 This Week
    Last Update:
    See Project
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 10
    CorpusSearch finds syntactic structures in a corpus of annotated sentence trees. It can be used as a research tool on a corpus, or as a development tool for building the corpus.
    Leader badge
    Downloads: 38 This Week
    Last Update:
    See Project
  • 11

    knowceans

    Utility classes from maps to search engine to random samplers

    .... --- Highlights: --- org.knowceans.util: IndexQuickSort, TableList: apply order of one array/list to others +++ Vectors, ArrayUtils: array convenience +++ RandomSamplers, CokusRandom, ArmSampler, Densities: random sampling and distributions +++ Arguments: command line parser +++ StopWatch, Which, ExternalProcess: runtime stuff +++ ParallelFor: OpenMP workalike +++ PatternString, NamedGroupRegex: regex convenience --- org.knowceans.corpus: CorpusSearcher: full-text search engine +++ LabelNumCorpus: svmlight corpus storage and filtering +++ NIPS corpus with text, authors, labels and citations --- org.knowceans.map: InvertibleHashMultiMap, BijectiveHashMap: implement n:m and 1:1 relations. --- Other libs: knowceans-arms = port of the Adaptive Rejection Metropolis Sampler (ARMS) for arbitrary distributions +++ lda-j = port of lda-c, implementing Latent Dirichlet Allocation (LDA)
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    algevox
    Sistema de reconocimiento de voz usando CMU Sphinx-4 y un modelo acústico basado en el corpus de VoxForge en español y gramáticas en JFlex y BYACC/J para el dictado en habla casi natural para la escritura de expresiones matemáticas.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    CALBC
    The code here is a parser for CALBC corpus into Java object.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    CORPSE (CORPus SEarch) is a powerful search engine written in Java. The aim is to provide an efficient implementation of a word level inverted index search with various cool functions that can be used on very large corpora.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Cunei is a data-driven machine translation system that builds dynamic, statistical models based on instances of known translations found in a corpus.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Web-as-corpus tools in Java. * Simple Crawler (and also integration with Nutch and Heritrix) * HTML cleaner to remove boiler plate code * Language recognition * Corpus builder
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Sanchay
    Sanchay is a collection of tools and APIs for language researchers. It has some implementations of NLP algorithms, some flexible APIs, several user friendly annotation interfaces and Sanchay Query Language for language resources.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    This proyect presents a system, which, from a corpus of documents, extracts information about a theme area, and a pedagogical components collection. This information is packed into fine granularity learning objects (metadata included).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    BabyTALK is to add another brick in the wall of natural languages learning. The baby needs to structure a corpus of texts when his tutor points and talks about a particular part of the corpus. The baby is also to describe any selected part of the corpus.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    CRFChunker: Conditional Random Fields Phrase Chunker (Phrase Chunking Tool) for English. The model was trained on sections 01..24 of WSJ corpus and using section 00 as the development test set (F1-score of 95.77). Chunking speed: 700 sentences/s
    Downloads: 1 This Week
    Last Update:
    See Project
  • 21
    CRFTagger: Conditional Random Fields Part-of-Speech (POS) Tagger for English. The model was trained on sections 01..24 of WSJ corpus and using section 00 as the development test set (accuracy of 97.00%). Tagging speed: 500 sentences/s.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    AmiGram is the AMI Graphical Representation and Annotation Module. It is a general-purpose tool for multimodal corpus annotation and allows the time line based annoation of NXT corpora in a layer based environment.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    TagHybrida is a French hybrid syntactic parser. TagHybrida is a four stage parser combining hand-writen and corpus based information.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    TIGER API is a library which allows Java programmers to easily access the structure of any corpus given as a TIGER-XML file.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    reputron is a knowledge extraction engine platform that covers all aspect of text mining, relevance, indexing and querying on a corpus of text documents.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
Auth0 Logo