Showing 317 open source projects for "learning language"

View related business solutions
  • Forever Free Full-Stack Observability | Grafana Cloud Icon
    Forever Free Full-Stack Observability | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • Secure File Transfer for Windows with Cerberus by Redwood Icon
    Secure File Transfer for Windows with Cerberus by Redwood

    Protect and share files over FTP/S, SFTP, HTTPS and SCP with the #1 rated Windows file transfer server.

    Cerberus supports unlimited users and connections on a single IP, with built-in encryption, 2FA, and a browser-based web client — all deployable in under 15 minutes with a 25-day free trial.
    Try for Free
  • 1
    d2l-zh

    d2l-zh

    Chinese-language edition of Dive into Deep Learning

    d2l‑zh is the Chinese-language edition of Dive into Deep Learning, an interactive, open‑source deep learning textbook that combines code, math, and explanatory text. It features runnable Jupyter notebooks compatible with multiple frameworks (e.g., PyTorch, MXNet, TensorFlow), comprehensive theoretical analysis, and exercises. Widely adopted in over 70 countries and used by more than 500 universities for teaching deep learning.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    CPT

    CPT

    CPT: A Pre-Trained Unbalanced Transformer

    A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Generation. We replace the old BERT vocabulary with a larger one of size 51271 built from the training data, in which we 1) add missing 6800+ Chinese characters (most of them are traditional Chinese characters); 2) remove redundant tokens (e.g. Chinese character tokens with ## prefix); 3) add some English tokens to reduce OOV.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    AllenNLP

    AllenNLP

    An open-source NLP research library, built on PyTorch

    AllenNLP makes it easy to design and evaluate new deep learning models for nearly any NLP problem, along with the infrastructure to easily run them in the cloud or on your laptop. AllenNLP includes reference implementations of high quality models for both core NLP problems (e.g. semantic role labeling) and NLP applications (e.g. textual entailment). AllenNLP supports loading "plugins" dynamically. A plugin is just a Python package that provides custom registered classes or additional...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Emb-GAM

    Emb-GAM

    An interpretable and efficient predictor using pre-trained models

    ...In this work, we aim to bridge this gap by using pre-trained neural language models to extract embeddings for each input before learning a linear model in the embedding space. The final model (which we call Emb-GAM) is a transparent, linear function of its input features and feature interactions. Leveraging the language model allows Emb-GAM to learn far fewer linear coefficients, model larger interactions, and generalize well to novel inputs.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build, govern, and optimize agents and models with Gemini Enterprise Agent Platform.
    Start Free
  • 5
    DialoGPT

    DialoGPT

    Large-scale pretraining for dialogue

    DialoGPT is an open-source conversational language model developed by Microsoft Research for generating natural dialogue responses using large-scale transformer architectures. The system is built on the GPT-2 architecture and is designed specifically for multi-turn conversation tasks, enabling machines to produce coherent responses during interactive dialogue. The model was trained on a massive dataset of approximately 147 million conversational exchanges extracted from Reddit discussion...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Pattern

    Pattern

    Web mining module for Python, with tools for scraping

    Pattern is an open-source Python library that provides tools for web mining, natural language processing, machine learning, and network analysis. The project integrates multiple capabilities into a single framework that allows developers to collect, process, and analyze textual data from the web. It includes modules for web scraping and crawling that can retrieve information from sources such as social media platforms, search engines, and online knowledge bases.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    Fairseq

    Fairseq

    Facebook AI Research Sequence-to-Sequence Toolkit written in Python

    Fairseq(-py) is a sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language modeling and other text generation tasks. We provide reference implementations of various sequence modeling papers. Recent work by Microsoft and Google has shown that data parallel training can be made significantly more efficient by sharding the model parameters and optimizer state across data parallel workers. These ideas are encapsulated in the...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    ASRT Speech Recognition

    ASRT Speech Recognition

    A Deep-Learning-Based Chinese Speech Recognition System

    ASRT is an end-to-end deep-learning Chinese ASR system built with TensorFlow/Keras, using convolution + CTC and a Max-Entropy HMM language model. It provides a REST/gRPC server backend and client SDKs in multiple languages (Python, Java, Go, Windows). Notably lightweight, it performs well without needing GPU acceleration and runs across platforms, targeting developers and researchers building Chinese voice interfaces.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    OpenPrompt

    OpenPrompt

    An Open-Source Framework for Prompt-Learning

    Prompt-learning is the latest paradigm to adapt pre-trained language models (PLMs) to downstream NLP tasks, which modifies the input text with a textual template and directly uses PLMs to conduct pre-trained tasks. OpenPrompt is a library built upon PyTorch and provides a standard, flexible and extensible framework to deploy the prompt-learning pipeline.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 10
    PromptSource

    PromptSource

    Toolkit for creating, sharing and using natural language prompts

    PromptSource is a toolkit for creating, sharing and using natural language prompts. Recent work has shown that large language models exhibit the ability to perform reasonable zero-shot generalization to new tasks. For instance, GPT-3 demonstrated that large language models have strong zero- and few-shot abilities. FLAN and T0 then demonstrated that pre-trained language models fine-tuned in a massively multitask fashion yield even stronger zero-shot performance. A common denominator in these...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Machine Learning PyTorch Scikit-Learn

    Machine Learning PyTorch Scikit-Learn

    Code Repository for Machine Learning with PyTorch and Scikit-Learn

    ...For those who are interested in knowing what this book covers in general, I’d describe it as a comprehensive resource on the fundamental concepts of machine learning and deep learning. The first half of the book introduces readers to machine learning using scikit-learn, the defacto approach for working with tabular datasets. Then, the second half of this book focuses on deep learning, including applications to natural language processing and computer vision.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 12
    Texar-PyTorch

    Texar-PyTorch

    Integrating the Best of TF into PyTorch, for Machine Learning

    Texar-PyTorch is a toolkit aiming to support a broad set of machine learning, especially natural language processing and text generation tasks. Texar provides a library of easy-to-use ML modules and functionalities for composing whatever models and algorithms. The tool is designed for both researchers and practitioners for fast prototyping and experimentation. Texar-PyTorch was originally developed and is actively contributed by Petuum and CMU in collaboration with other institutes. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    CodeSearchNet

    CodeSearchNet

    Datasets, tools, and benchmarks for representation learning of code

    CodeSearchNet is a large-scale dataset and research benchmark designed to advance the development of systems that retrieve source code using natural language queries. The project was created through collaboration between GitHub and Microsoft Research and aims to support research on semantic code search and program understanding. The dataset contains millions of pairs of source code functions and corresponding documentation comments extracted from open-source repositories. These pairs allow machine learning models to learn relationships between natural language descriptions and programming code. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Hugging Face Transformer

    Hugging Face Transformer

    CPU/GPU inference server for Hugging Face transformer models

    Optimize and deploy in production Hugging Face Transformer models in a single command line. At Lefebvre Dalloz we run in-production semantic search engines in the legal domain, in the non-marketing language it's a re-ranker, and we based ours on Transformer. In that setup, latency is key to providing a good user experience, and relevancy inference is done online for hundreds of snippets per user query. Most tutorials on Transformer deployment in production are built over Pytorch and FastAPI....
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Awesome Decision Tree Papers

    Awesome Decision Tree Papers

    A collection of research papers on decision, classification, etc.

    A collection of research papers on decision, classification and regression trees with implementations.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Mocking Bird

    Mocking Bird

    Clone a voice in 5 seconds to generate arbitrary speech in real-time

    MockingBird is an open-source voice cloning and real-time speech generation toolkit that lets you clone a speaker’s voice from a short audio sample (reportedly as little as 5 seconds) and then synthesize arbitrary speech in that voice. It builds on deep-learning based TTS / voice-cloning technology (in the lineage of projects such as Real-Time-Voice-Cloning), but extends it with support for Mandarin Chinese and multiple Chinese speech datasets — broadening its applicability beyond English....
    Downloads: 1 This Week
    Last Update:
    See Project
  • 17
    Aquila DB

    Aquila DB

    An easy to use Neural Search Engine

    Aquila DB is a Neural search engine. In other words, it is a database to index Latent Vectors generated by ML models along with JSON Metadata to perform k-NN retrieval. It is dead simple to set up, language-agnostic, and drop in addition to your Machine Learning Applications. Aquila DB, as of current features is a ready solution for Machine Learning engineers and Data scientists to build Neural Information Retrieval applications out of the box with minimal dependencies.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    FARM

    FARM

    Fast & easy transfer learning for NLP

    ...Easy fine-tuning of language models to your task and domain language. AMP optimizers (~35% faster) and parallel preprocessing (16 CPU cores => ~16x faster). Modular design of language models and prediction heads. Switch between heads or combine them for multitask learning. Full Compatibility with HuggingFace Transformers' models and model hub. Smooth upgrading to newer language models.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Kashgari

    Kashgari

    Kashgari is a production-level NLP Transfer learning framework

    Kashgari is a simple and powerful NLP Transfer learning framework, build a state-of-art model in 5 minutes for named entity recognition (NER), part-of-speech tagging (PoS), and text classification tasks.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    AliceMind

    AliceMind

    ALIbaba's Collection of Encoder-decoders from MinD

    This repository provides pre-trained encoder-decoder models and its related optimization techniques developed by Alibaba's MinD (Machine IntelligeNce of Damo) Lab. Pre-trained models for natural language understanding (NLU). We extend BERT to a new model, StructBERT, by incorporating language structures into pre-training. Specifically, we pre-train StructBERT with two auxiliary tasks to make the most of the sequential order of words and sentences, which leverage language structures at the...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    SimCSE

    SimCSE

    SimCSE: Simple Contrastive Learning of Sentence Embeddings

    SimCSE (Simple Contrastive Learning of Sentence Embeddings) is a machine learning framework for training sentence embeddings using contrastive learning. It improves representation learning for NLP tasks.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    jiant

    jiant

    jiant is an nlp toolkit

    Jiant is a multitask NLP framework for fine-tuning transformer-based models on multiple natural language understanding (NLU) tasks.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    SRU

    SRU

    Training RNNs as Fast as CNNs

    Common recurrent neural architectures scale poorly due to the intrinsic difficulty in parallelizing their state computations. In this work, we propose the Simple Recurrent Unit (SRU), a light recurrent unit that balances model capacity and scalability. SRU is designed to provide expressive recurrence, enable highly parallelized implementation, and comes with careful initialization to facilitate the training of deep models. We demonstrate the effectiveness of SRU on multiple NLP tasks. SRU...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    PORORO

    PORORO

    Platform of neural models for natural language processing

    pororo performs Natural Language Processing and Speech-related tasks. It is easy to solve various subtasks in the natural language and speech processing field by simply passing the task name. Recognized speech sentences using the trained model. Currently English, Korean and Chinese support. Get vector or find similar words and entities from pretrained model using Wikipedia.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Self-Attentive Parser

    Self-Attentive Parser

    High-accuracy NLP parser with models for 11 languages

    LightAutoML is an automated machine learning (AutoML) framework developed by Sberbank AI Lab, designed to facilitate the development of machine learning models with minimal human intervention.
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB