• Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • Build Agents and Models on One Platform Icon
    Build Agents and Models on One Platform

    Everything you need to build production-ready agents and models. Access 200+ Google and third-party AI models and tools.

    Gemini Enterprise Agent Platform is Google Cloud's comprehensive platform for developers to build, scale, govern, and optimize agents and models. Choose from Google's most advanced models and third-party models like Anthropic's Claude Model Family.
    Try It Free
  • 1

    modnlp

    Modular Suite of NLP Tools

    ...It provides an API and tools for (inverted) indexing, storage and retrieval of large amounts of text, with (XML-based) handling of meta-data, tools for text categorisation, including, functionality for XML parsing, term set reduction (and basic keyword extraction), probabilistic classifier induction, sample classification tools, and evaluation modules, a suite of corpus management, curation and distributed access tools. If you use the tool please consider referencing it using the following article: Luz, S., & Sheehan, S. (2020). Methods and visualization tools for the analysis of medical, political and scientific concepts in Genealogies of Knowledge. Palgrave Communications, 6(1), 1-20. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Wikipedia2Vec

    Wikipedia2Vec

    A tool for learning vector representations of words and entities

    Wikipedia2Vec is an embedding learning tool that creates word and entity vector representations from Wikipedia, enabling NLP models to leverage structured and contextual knowledge.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    NLG-Eval

    NLG-Eval

    Evaluation code for various unsupervised automated metrics

    NLG-Eval is a toolkit for evaluating the quality of natural language generation (NLG) outputs using multiple automated metrics such as BLEU, METEOR, and ROUGE.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    CC-Net

    CC-Net

    Tools to download and cleanup Common Crawl data

    cc_net provides tools to download, segment, clean, and filter Common Crawl to build large-scale text corpora, including monolingual datasets and the multilingual CC-100 collection introduced in the associated paper. It includes pipelines to fetch snapshots, extract text, de-duplicate, identify language, and apply quality filtering based on heuristics and language models. The outputs are intended for pretraining language models and for creating standardized corpora that can be reproduced or...
    Downloads: 0 This Week
    Last Update:
    See Project
  • AI-powered service management for IT and enterprise teams Icon
    AI-powered service management for IT and enterprise teams

    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.
    Try it Free
  • 5

    KSUCCA Corpus

    A 50 million tokens corpus of Classical Arabic.

    King Saud University Corpus of Classical Arabic (KSUCCA) is a pioneering 50 million tokens annotated corpus of Classical Arabic texts from the period of pre-Islamic era until the fourth Hijri century (equivalent to the period from the seventh until early eleventh century CE), which is the period of pure classical Arabic. The main aim of this corpus is to be used for studying the distributional lexical semantics of The Quran words.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 6
    Dragonfire

    Dragonfire

    The open-source virtual assistant for Ubuntu based Linux distributions

    Dragonfire is the open-source virtual assistant project for Ubuntu-based Linux distributions. Her main objective is to serve as a command and control interface to the helmet user. So that you will be able to give orders just by using your voice commands and your eye movements. That makes the helmet handsfree. We are planning to ship Dragonfire as a preinstalled software package on DragonOS Linux Distribution. DragonOS will be a Linux distribution specially designed for the helmet. It will...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    PyTorch Natural Language Processing

    PyTorch Natural Language Processing

    Basic Utilities for PyTorch Natural Language Processing (NLP)

    ...With your batch in hand, you can use PyTorch to develop and train your model using gradient descent. For example, check out this example code for training on the Stanford Natural Language Inference (SNLI) Corpus. Now you've setup your pipeline, you may want to ensure that some functions run deterministically. Wrap any code that's random, with fork_rng and you'll be good to go. Now that you've computed your vocabulary, you may want to make use of pre-trained word vectors to set your embeddings.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8

    Arabic Corpus

    Text categorization, arabic language processing, language modeling

    The Arabic Corpus {compiled by Dr. Mourad Abbas ( http://sites.google.com/site/mouradabbas9/corpora ) The corpus Khaleej-2004 contains 5690 documents. It is divided to 4 topics (categories). The corpus Watan-2004 contains 20291 documents organized in 6 topics (categories). Researchers who use these two corpora would mention the two main references: (1) For Watan-2004 corpus ---------------------- M.
    Leader badge
    Downloads: 3 This Week
    Last Update:
    See Project
  • 9
    NeuroNER

    NeuroNER

    Named-entity recognition using neural networks

    ..."deep learning") Is cross-platform, open source, freely available, and straightforward to use. Enables the users to create or modify annotations for a new or existing corpus. Train the neural network that performs the NER. During the training, NeuroNER allows monitoring of the network. Evaluate the quality of the predictions made by NeuroNER. The performance metrics can be calculated and plotted by comparing the predicted labels with the gold labels.
    Downloads: 0 This Week
    Last Update:
    See Project
  • $300 Free Credits for Your Google Cloud Projects Icon
    $300 Free Credits for Your Google Cloud Projects

    Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

    Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.
    Start Free Trial
  • 10
    lazynlp

    lazynlp

    Library to scrape and clean web pages to create massive datasets

    LazyNLP is a lightweight tool for collecting and curating large-scale text datasets for machine learning and NLP applications with minimal manual effort.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Osman Arabic Text Readability

    Osman Arabic Text Readability

    Open Source tool for Arabic text readability

    We present OSMAN (Open Source Metric for Measuring Arabic Narratives) - a novel open source Arabic readability metric and tool. The open source Java tool allows users to calculate readability for Arabic text (with and without diacritics). The tool provides methods to split the text into words and sentence, count syllables, Faseeh letters, hard and complex words in addition to adding diacritics (vocalise text). This makes the tool useful for researchers and educators working with Arabic text....
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12

    Natural Language Analysis with Ngrams

    NLP tool for statistical analysis of words, sentences, documents

    ...In the future versions, user will be able to convert a single word to numerical data, to be able to compare two words and get the comparison data, and to be able to do the same for the sentences, paragraphs and documents. I will JAR-it once I decide that it can be called a final release. This project was made by creating a corpus from the Google Ngrams data for English Language, version 20120701. EOWL list of English words was used to filter-out the words from Ngrams data. For each year, per word, the data was added and calculated to describe the average appearance of a word per document for a given year. Before using this program, you MUST download the corpus.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13

    Persica-A new Persian corpus for NLP

    This project presents a new corpus for NEWS text analysis in Persian

    Lack of multi-application text corpus despite of the surging text data is a serious bottleneck in the text mining and natural language processing especially in Persian language. This project presents a new corpus for NEWS articles analysis in Persian called Persica. NEWS analysis includes NEWS classification, topic discovery and classification, category classification and many more procedures.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 14

    KALIMAT Multipurpose Arabic Corpus

    A corpus that could be of help for researchers working on Arabic NLP

    KALIMAT a Multipurpose Arabic Corpus We are pleased to announce the immediate availability of KALIMAT 1.0, KALIMAT is an Arabic natural language resource that consists of: 1) 20,291 Arabic articles collected from the Omani newspaper Alwatan by (Abbas et al. 2011). 2) 20,291 Extractive Single-document system summaries. 3) 2,057 Extractive Multi-document system summaries. 4) 20,291 Named Entity Recognised articles. 5) 20,291 Part of Speech Tagged articles. 6) 20,291 Morphologically Analyse articles. ...
    Leader badge
    Downloads: 6 This Week
    Last Update:
    See Project
  • 15
    Redundancy due to cut-paste operations in text creates bias in machine learning for NLP. This module takes a directory and produces a subset of the files in that directory (in a list) with an upper bound on similarity between two files.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16

    English-Khmer S. Machine Translation

    English-Khmer Automatic Statistic Machine Translation (SMT)

    Automatic Machine Translation from English to Khmer project is the first effort in Natural Language Processing field for translating English to Khmer (Cambodian) language. This project uses Domy CE, an open source SMT toolkit, for training parallel corpus and web technologies such as Python, Apache2, HTML, XML, and XSLT for developing web-based application. This project is developed by Ms. Kim Sokphyrum (DU) and Ms. Suos Samak (Jamia), under Supervision of Mr. Javier Sola, a Program Manager at Open Institute (OI), Cambodia, Dr. Vasudha Bhatnagar, an Assistant professor and a Head of Computer Science at University of Delhi (DU), New Delhi, India. and Dr. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    CRFSharp

    CRFSharp

    CRFSharp is a .NET(C#) implementation of Conditional Random Field

    ...It encodes model parameters by L-BFGS. Moreover, it has many significant improvement than CRF++, such as totally parallel encoding, optimizing memory usage and so on. Currently, when training corpus, compared with CRF++, CRF# can make full use of multi-core CPUs and only uses very low memory, and memory grow is very smoothly and slowly while amount of training corpus, tags increase. with multi-threads process, CRF# is more suitable for large data and tags training than CRF++ now. For example, in machine with 64GB, CRF# encodes model with more than 4.5 hundred million features quickly.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    This project aims to build a suite of Natural Language Processing tools. Modules will include corpus indexing and access tools, a part-of-speech tagger, tokenisers, text classification software, etc.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    A toolkit with using Suffix Array indexing for empirical natural language processing. Providing functions such as searching the occurrences of n-grams in the corpus and suffix array language model which can use arbitrarily long history.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Clipsyll is a collection of scripts and programs for dowloading, codifying, analysing (using NLTK) CLIPS, the largest Italian corpus of spoken language. It includes a syllabification module based on the SSP: http://sourceforge.net/projects/silly
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Sanchay
    Sanchay is a collection of tools and APIs for language researchers. It has some implementations of NLP algorithms, some flexible APIs, several user friendly annotation interfaces and Sanchay Query Language for language resources.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    CoPT, Corpus Processing Tools, is a set of java classes intended to assist field linguists, NLP researchers and developers, students and software developers in all corpus-related processing.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
Auth0 Logo