Showing 44 open source projects for "sentence"

View related business solutions
  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • 1
    Neuro-comma

    Neuro-comma

    Punctuation restoration production-ready model for Russian language

    This library was developed with the idea to help us to create punctuation restoration models to memorize trained parameters, data, training visualization, etc. The Library doesn't use any high-level frameworks, such as PyTorch-lightning or Keras, to reduce the level entry threshold. Feel free to fork this repo and edit model or dataset classes for your purposes. Our team always uses the latest version and features of Python. We started with Python 3.9, but realized, that there is no FastAPI...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    AliceMind

    AliceMind

    ALIbaba's Collection of Encoder-decoders from MinD

    ...Specifically, we pre-train StructBERT with two auxiliary tasks to make the most of the sequential order of words and sentences, which leverage language structures at the word and sentence levels, respectively. Pre-trained models for natural language generation (NLG). We propose a novel scheme that jointly pre-trains an autoencoding and autoregressive language model on a large unlabeled corpus, specifically designed for generating new text conditioned on context. It achieves new SOTA results in several downstream tasks.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    XLM (Cross-lingual Language Model)

    XLM (Cross-lingual Language Model)

    PyTorch original implementation of Cross-lingual Language Model

    XLM (Cross-lingual Language Model) is a family of multilingual pretraining methods that align representations across languages to enable strong zero-shot transfer. It popularized objectives like Masked Language Modeling (MLM) across many languages and Translation Language Modeling (TLM) that jointly trains on parallel sentence pairs to tighten cross-lingual alignment. Using a shared subword vocabulary, XLM learns language-agnostic features that work well for classification and sequence labeling tasks such as XNLI, NER, and POS without target-language supervision. The repository provides preprocessing pipelines, training code, and fine-tuning scripts so you can reproduce benchmark results or adapt models to your own multilingual corpora. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    SentEval

    SentEval

    A python tool for evaluating the quality of sentence embeddings

    ...Because the interface is minimal, researchers can plug in encoders from any framework or language model and obtain a broad evaluation with little glue code. SentEval helped establish common baselines and reporting conventions in the sentence-representation community, reducing friction when comparing new methods.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Train ML Models With SQL You Already Know Icon
    Train ML Models With SQL You Already Know

    BigQuery automates data prep, analysis, and predictions with built-in AI assistance.

    Build and deploy ML models using familiar SQL. Automate data prep with built-in Gemini. Query 1 TB and store 10 GB free monthly.
    Try Free
  • 5
    Tashkeela processed

    Tashkeela processed

    Tashkeela dataset cleaned and normalized.

    ...The result is a space-separated tokens file, where the words and the numbers are separated, but not the sequences of punctuation (ie, an ending parenthesis followed by a dot). The sentence segmentation is done at usual punctuations such as dots, commas, interrogation/exclamation marks, and line end as well. The partition process is done by shuffling groups of sentences then dividing each group into three parts (Train/Val/Test) and storing them in individual files. The original Tashkeela dataset is available at https://sourceforge.net/projects/tashkeela/
    Downloads: 1 This Week
    Last Update:
    See Project
  • 6
    jieba

    jieba

    Stuttering Chinese word segmentation

    "Jaba" Chinese word segmentation, do the best Python Chinese word segmentation component. Four word segmentation modes are supported. Precise mode, which tries to cut the sentence most precisely, suitable for text analysis. Full mode, scans all the words that can be formed into words in the sentence, the speed is very fast, but the ambiguity cannot be resolved. The search engine mode, on the basis of the precise mode, divides the long words again to improve the recall rate, which is suitable for word segmentation in search engines. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    InferSent

    InferSent

    InferSent sentence embeddings

    InferSent is a supervised sentence embedding method that learns universal representations from Natural Language Inference data and transfers well to many downstream tasks. It uses a BiLSTM encoder with max-pooling to produce fixed-length sentence vectors that capture semantics beyond bag-of-words statistics. Trained on large NLI datasets, the embeddings generalize across tasks like sentiment analysis, entailment, paraphrase detection, and semantic similarity with simple linear classifiers. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8

    FOSS license

    FOSS license and sentence token

    We propose a method to mark the comments of license as sentence-token. We use the term sentence-token to refer to a sentence of a known license. A license (both by-inclusion or by-reference) is a sequence of sentence-tokens. Sentence-tokens are generalized using one or more regular expressions. we propose an idea for license identification based on the analysis of each sentence in the license statement of a source code file.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    TextRank

    TextRank

    TextRank implementation for Python 3

    TextRank is an implementation of the TextRank algorithm for extractive text summarization and keyword extraction, inspired by Google’s PageRank.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Full-stack observability with actually useful AI | Grafana Cloud Icon
    Full-stack observability with actually useful AI | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 10

    dadosSemiotica

    Collecter and manager of semiotica annalisis data

    This program is a web application to collect and organize data of text analysis. It works with sets of texts and the analysis are done on portions of the length of a sentence. One of the preprocessing modules is based on CoGroo (A LibreOffice & OpenOffice.org Portuguese Grammar Checker).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    cnn-text-classification-tf

    cnn-text-classification-tf

    Convolutional Neural Network for Text Classification in Tensorflow

    The cnn-text-classification-tf repository by Denny Britz is a well-known educational implementation of convolutional neural networks for text classification using TensorFlow, aimed at helping developers and researchers understand how CNNs can be applied to natural language processing tasks. Based loosely on Kim’s influential paper on CNNs for sentence classification, this codebase demonstrates how to preprocess text data, convert words into learned embeddings, and apply multiple convolution filters to extract n-gram features that are then pooled and fed into a classifier. The project includes scripts for training, evaluation, and data handling, making it easy to run experiments on datasets such as movie reviews or other labeled text collections. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    DeepLearn

    DeepLearn

    Implementation of research papers on Deep Learning+ NLP+ CV in Python

    Welcome to DeepLearn. This repository contains an implementation of the following research papers on NLP, CV, ML, and deep learning. The required dependencies are mentioned in requirement.txt. I will also use dl-text modules for preparing the datasets. If you haven't use it, please do have a quick look at it. CV, transfer learning, representation learning.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13

    RaspberryPi_LedSign

    Python code drive PT6961 Led Sign AH41-01158A

    ...The Led Sign board (AH41-01158A) has been taken from an old Home Theater. A RaspberryPi model B has been used to communicate with. This code will ask the user to enter a sentence and will display this sentence 4 times.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Passencryption 2016 beta

    Passencryption 2016 beta

    PassEncryption is designed to encrypt files and generate passwords.

    PassEncryption is a software written in python designed to encrypt and decrypt files as well as generating passwords. PassEncryption uses RSA encryption method to generate passwords for each accounts with a personal encryption key. Windows 7 and more only. IMPORTANT: Current release doesn't work with 32 bits architectures, will be fixed in next update
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15

    pyrept

    A mp3 language repeater

    A mp3 player written by python, integrated mp3play module.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Sentence Parser for Python
    This is a code for the sentence parsing that does its job properly and FAST. The main problem is that you really need a database of abbreviations so that phrases such as "Dr. Smith" are not calculated as 2 sentences, which means that the good parser must be language dependent. I am also providing a list of all English abbreviations with the code. You can always tweak the code to get a nicer output, but the main idea is still there, and completed with this little program.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Data Ninja

    Data Ninja

    A document clustering system with search & report generation features

    ...The search features take search terms as input by the user and a directory with documents as an input and outputs an Excel spreadsheet displaying all documents containing the search term and gives similar documents to these. The 2nd feature gives each sentence containing the search term from documents found. The report generation feature specifically for use by audit companies takes an audit report as an input and outputs an insight log and draft management letter with insights pulled from the report. This feature can be customised to suit a company's requirements. This software works with pdf, docx, txt and csv files and the zip file must be saved in "My Documents".
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Software to fit whole-sentence language models using the principle of maximum entropy. For developers of speech recognizers, text prediction interfaces, OCR, machine translation software.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Sentence Translator for Windows and PHP, and eventually C. Specialises in constructed (artificial) languages, but also has support for natural languages.
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB