Showing 521 open source projects for "learning language"

View related business solutions
  • Train ML Models With SQL You Already Know Icon
    Train ML Models With SQL You Already Know

    BigQuery automates data prep, analysis, and predictions with built-in AI assistance.

    Build and deploy ML models using familiar SQL. Automate data prep with built-in Gemini. Query 1 TB and store 10 GB free monthly.
    Try Free
  • Go From AI Idea to AI App Fast Icon
    Go From AI Idea to AI App Fast

    One platform to build, fine-tune, and deploy ML models. No MLOps team required.

    Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
    Try Free
  • 1
    A Survey of Surveys

    A Survey of Surveys

    A collection of 1000+ survey papers on Natural Language Processing

    ...These topics include areas such as neural machine translation, language models, computer vision, and deep learning architectures. The repository organizes hundreds of papers into thematic categories and includes references, links, and bibliographic information to facilitate research and literature exploration.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Open Deep Research

    Open Deep Research

    An AI-powered research assistant that performs iterative research

    ...It is intentionally kept compact, with a codebase under roughly 500 lines, making it highly approachable for experimentation and learning. The architecture demonstrates how modern agent pipelines can continuously gather evidence, extract learnings, and adjust research direction over time.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Google Research

    Google Research

    This repository contains code released by Google Research

    Google Research is a massive monorepo that hosts a wide range of research code released by Google Research teams across machine learning, artificial intelligence, robotics, natural language processing, and other advanced domains. Rather than being a single framework, the repository serves as a centralized collection of experimental projects, reference implementations, and reproducible research artifacts. It is intended primarily for researchers and advanced practitioners who want to explore cutting-edge techniques directly from the teams that developed them. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    ChatCraft.org

    ChatCraft.org

    Developer-oriented ChatGPT clone

    Welcome to ChatCraft.org, your open-source web companion for coding with Large Language Models (LLMs). Designed with developers in mind, ChatCraft transforms the way you interact with GPT models, making it effortless to read, write, debug, and enhance your code. Whether you're exploring new designs or learning about the latest technologies, ChatCraft is your go-to platform. With a user interface inspired by GitHub, and editable Markdown everywhere, you'll feel right at home from the get-go.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Forever Free Full-Stack Observability | Grafana Cloud Icon
    Forever Free Full-Stack Observability | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 5
    Ling-V2

    Ling-V2

    Ling-V2 is a MoE LLM provided and open-sourced by InclusionAI

    ...Trained on more than 20 trillion tokens of high-quality data and enhanced through multi-stage supervised fine-tuning and reinforcement learning, Ling-V2’s models demonstrate strong general reasoning, mathematical problem-solving, coding understanding, and knowledge-intensive task performance.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    fairseq2

    fairseq2

    FAIR Sequence Modeling Toolkit 2

    fairseq2 is a modern, modular sequence modeling framework developed by Meta AI Research as a complete redesign of the original fairseq library. Built from the ground up for scalability, composability, and research flexibility, fairseq2 supports a broad range of language, speech, and multimodal content generation tasks, including instruction fine-tuning, reinforcement learning from human feedback (RLHF), and large-scale multilingual modeling. Unlike the original fairseq—which evolved into a large, monolithic codebase—fairseq2 introduces a clean, plugin-oriented architecture designed for long-term maintainability and rapid experimentation. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Poetiq

    Poetiq

    Reproduction of Poetiq's record-breaking submission to the ARC-AGI-1

    poetiq-arc-agi-solver is the open-source codebase from Poetiq that replicates their record-breaking submission to the challenging benchmark suite ARC-AGI (both ARC-AGI-1 and ARC-AGI-2). The project demonstrates a system that orchestrates large language models (LLMs) — like those from major providers — with carefully engineered prompting, reasoning workflows, and dynamic strategies, to tackle the abstract, logic-heavy problems in ARC-AGI. Instead of relying on a single prompt or fixed...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    ESPnet

    ESPnet

    End-to-end speech processing toolkit

    ESPnet is a comprehensive end-to-end speech processing toolkit covering a wide spectrum of tasks, including automatic speech recognition (ASR), text-to-speech (TTS), speech translation (ST), speech enhancement, speaker diarization, and spoken language understanding. It uses PyTorch as its deep learning engine and adopts a Kaldi-style data processing pipeline for features, data formats, and experimental recipes. This combination allows researchers to leverage modern neural architectures while still benefiting from the robust data preparation practices developed in the speech community. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Large Concept Model

    Large Concept Model

    Language modeling in a sentence representation space

    Large Concept Model is a research codebase centered on concept-centric representation learning at scale, aiming to capture shared structure across many categories and modalities. It organizes training around concepts (rather than just raw labels), encouraging models to understand attributes, relations, and compositional structure that transfer across tasks. The repository provides training loops, data tooling, and evaluation routines to learn and probe these concept embeddings, typically...
    Downloads: 0 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 10
    torchtext

    torchtext

    Data loaders and abstractions for text and NLP

    We recommend Anaconda as a Python package management system. Please refer to pytorch.org for the details of PyTorch installation. LTS versions are distributed through a different channel than the other versioned releases. Alternatively, you might want to use the Moses tokenizer port in SacreMoses (split from NLTK). You have to install SacreMoses. To build torchtext from source, you need git, CMake and C++11 compiler such as g++. When building from source, make sure that you have the same C++...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 11
    minbpe

    minbpe

    Minimal, clean code for the Byte Pair Encoding (BPE) algorithm

    minbpe is a minimal, clean implementation of byte-level Byte Pair Encoding (BPE), the tokenization approach widely used in modern language models. It operates on UTF-8 encoded bytes rather than Unicode characters, which makes it robust to arbitrary text inputs and avoids needing a language-specific character vocabulary. The repository is structured as a teaching-oriented implementation that shows how to train a tokenizer by learning merge rules, then apply those merges to encode text into token IDs and decode tokens back into text. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Sagify

    Sagify

    LLMs and Machine Learning done easily

    Sagify is a tool designed to simplify the process of deploying and managing machine learning models, including Large Language Models (LLMs), on AWS SageMaker. It abstracts the complexities involved in setting up and managing SageMaker resources, allowing developers to focus on building and fine-tuning models. Sagify provides a command-line interface (CLI) and supports various machine-learning frameworks, making it accessible for a wide range of users.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    higgsfield

    higgsfield

    Fault-tolerant, highly scalable GPU orchestration

    Higgsfield is an open-source, fault-tolerant, highly scalable GPU orchestration, and a machine learning framework designed for training models with billions to trillions of parameters, such as Large Language Models (LLMs).
    Downloads: 6 This Week
    Last Update:
    See Project
  • 14
    OpenNN - Open Neural Networks Library

    OpenNN - Open Neural Networks Library

    Machine learning algorithms for advanced analytics

    OpenNN is a software library written in C++ for advanced analytics. It implements neural networks, the most successful machine learning method. Some typical applications of OpenNN are business intelligence (customer segmentation, churn prevention…), health care (early diagnosis, microarray analysis…) and engineering (performance optimization, predictive maitenance…). OpenNN does not deal with computer vision or natural language processing. The main advantage of OpenNN is its high performance. ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 15
    LangChain Extract

    LangChain Extract

    Did you say you like data?

    ...Developers can create reusable “extractors” that define what type of information should be pulled from a document, along with example prompts that improve extraction quality through in-context learning.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Adaptive Intelligence

    Adaptive Intelligence

    Adaptive Intelligence also known as "Artificial General Intelligence"

    Adaptive Intelligence is the implementation of neural science, forensic psychology , behavioral science with machine-learning and artificial intelligence to provide advanced automated software platforms with the ability to adjust and thrive in dynamic environments by combining cognitive flexibility, emotional regulation, resilience, and practical problem-solving skills.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 17
    UnBBayes

    UnBBayes

    Framework & GUI for Bayes Nets and other probabilistic models.

    UnBBayes is a probabilistic network framework written in Java. It has both a GUI and an API with inference, sampling, learning and evaluation. It supports Bayesian networks, influence diagrams, MSBN, OOBN, HBN, MEBN/PR-OWL, PRM, structure, parameter and incremental learning. Please, visit our wiki (https://sourceforge.net/p/unbbayes/wiki/Home/) for more information. Check out the license section (https://sourceforge.net/p/unbbayes/wiki/License/) for our licensing policy.
    Downloads: 13 This Week
    Last Update:
    See Project
  • 18
    Amoeba

    Amoeba

    Linux Command Line Learning Program

    Amoeba is a Linux command-line learning program that observes and adapts to the Linux command line storing learned strings and their usage data. It enhances command-line proficiency by capturing command outputs, adapting string lengths, and periodically saving knowledge. Sandboxing is essential for security, and optionally a virtual machine would further isolates it from the host system. Contributions and improvements are encouraged via the GitHub repository.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 19
    AIConfig

    AIConfig

    AIConfig is a config-based framework to build generative AI apps

    AIConfig is an open-source framework designed to simplify the development and management of generative AI applications by separating AI logic from application code. The framework allows prompts, model configurations, and parameters to be stored as structured configuration files that can be version controlled and managed independently from the rest of the software system. This approach improves collaboration between developers, prompt engineers, and machine learning practitioners by turning...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    PyTextRank

    PyTextRank

    Python implementation of TextRank algorithms

    PyTextRank is a Python implementation of TextRank as a spaCy pipeline extension, for graph-based natural language work -- and related knowledge graph practices.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Bandicoot

    Bandicoot

    fast C++ library for GPU linear algebra & scientific computing

    * Fast GPU linear algebra library (matrix maths) for the C++ language, aiming towards a good balance between speed and ease of use * Provides high-level syntax and functionality deliberately similar to Matlab * Provides an API that is aiming to be compatible with Armadillo for easy transition between CPU and GPU linear algebra code * Useful for algorithm development directly in C++, or quick conversion of research code into production environments * Distributed under the permissive Apache 2.0 license, useful for both open-source and proprietary (closed-source) software * Can be used for machine learning, pattern recognition, computer vision, signal processing, bioinformatics, statistics, finance, etc * Downloads: http://coot.sourceforge.io/download.html * Documentation: http://coot.sourceforge.io/docs.html * Bug reports: http://coot.sourceforge.io/faq.html * Git repo: https://gitlab.com/conradsnicta/bandicoot-code
    Downloads: 6 This Week
    Last Update:
    See Project
  • 22
    GLM-4-32B-0414

    GLM-4-32B-0414

    Open Multilingual Multimodal Chat LMs

    GLM-4-32B-0414 is a powerful open-source large language model featuring 32 billion parameters, designed to deliver performance comparable to leading models like OpenAI’s GPT series. It supports multilingual and multimodal chat capabilities with an extensive 32K token context length, making it ideal for dialogue, reasoning, and complex task completion. The model is pre-trained on 15 trillion tokens of high-quality data, including substantial synthetic reasoning datasets, and further enhanced with reinforcement learning and human preference alignment for improved instruction-following and function calling. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 23
    Universal Sentence Encoder

    Universal Sentence Encoder

    Encoder of greater-than-word length text trained on a variety of data

    The Universal Sentence Encoder (USE) is a pre-trained deep learning model designed to encode sentences into fixed-length embeddings for use in various natural language processing (NLP) tasks. It leverages Transformer and Deep Averaging Network (DAN) architectures to generate embeddings that capture the semantic meaning of sentences. The model is designed for tasks like sentiment analysis, semantic textual similarity, and clustering, and provides high-quality sentence representations in a computationally efficient manner.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Wikipedia2Vec

    Wikipedia2Vec

    A tool for learning vector representations of words and entities

    Wikipedia2Vec is an embedding learning tool that creates word and entity vector representations from Wikipedia, enabling NLP models to leverage structured and contextual knowledge.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Weak-to-Strong

    Weak-to-Strong

    Implements weak-to-strong learning for training stronger ML models

    Weak-to-Strong is an OpenAI research codebase that implements the concept of weak-to-strong generalization, as described in the accompanying paper. The project provides tools for training larger “strong” models using labels or guidance generated by smaller “weak” models. Its core functionality focuses on binary classification tasks, with support for fine-tuning pretrained language models and experimenting with different loss functions, including confidence-based auxiliary losses. The...
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB