Showing 52 open source projects for "encoding"

View related business solutions
  • Ship Agents Faster Icon
    Ship Agents Faster

    Transform your applications and workflows into powerful agentic systems at global scale.

    Gemini Enterprise Agent Platform lets you rapidly build, scale, govern and optimize production-ready agents grounded in your organization's data. The platform enables developers to build custom or pre-built agents for virtually any use case. New customers get $300 in free credits.
    Get Started Free
  • Compliant and Reliable File Transfers Backed by Top Security Certifications Icon
    Compliant and Reliable File Transfers Backed by Top Security Certifications

    Cerberus FTP Server delivers SOC 2 Type II certified security and FIPS 140-2 validated encryption.

    Stop relying on non-certified, legacy file transfer tools that creak under the weight of modern security demands. Get full audit trails, advanced access controls and more supported by an award-winning team of experts. Start your free 25-day trial today.
    Start Free Trial
  • 1
    minbpe

    minbpe

    Minimal, clean code for the Byte Pair Encoding (BPE) algorithm

    minbpe is a minimal, clean implementation of byte-level Byte Pair Encoding (BPE), the tokenization approach widely used in modern language models. It operates on UTF-8 encoded bytes rather than Unicode characters, which makes it robust to arbitrary text inputs and avoids needing a language-specific character vocabulary. The repository is structured as a teaching-oriented implementation that shows how to train a tokenizer by learning merge rules, then apply those merges to encode text into token IDs and decode tokens back into text. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Streamline Analyst

    Streamline Analyst

    AI agent that streamlines the entire process of data analysis

    Streamline Analyst is a cutting-edge, open-source application powered by Large Language Models (LLMs) designed to revolutionize data analysis. This Data Analysis Agent effortlessly automates all the tasks such as data cleaning, preprocessing, and even complex operations like identifying target objects, partitioning test sets, and selecting the best-fit models based on your data. With Streamline Analyst, results visualization and evaluation become seamless.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    GPT-2

    GPT-2

    Code for the paper Language Models are Unsupervised Multitask Learners

    This repository contains the code and model weights for GPT-2, a large-scale unsupervised language model described in the OpenAI paper “Language Models are Unsupervised Multitask Learners.” The intent is to provide a starting point for researchers and engineers to experiment with GPT-2: generate text, fine‐tune on custom datasets, explore model behavior, or study its internal phenomena. The repository includes scripts for sampling, training, downloading pre-trained models, and utilities for...
    Downloads: 15 This Week
    Last Update:
    See Project
  • 4
    Bert-VITS2

    Bert-VITS2

    VITS2 backbone with multilingual-bert

    Bert-VITS2 is a neural text-to-speech project that combines a VITS2 backbone with a multilingual BERT front-end to produce high-quality speech in multiple languages. The core idea is to use BERT-style contextual embeddings for text encoding while relying on a refined VITS2 architecture for acoustic generation and vocoding. The repository includes everything needed to train, fine-tune, and run the model, from configuration files to preprocessing scripts, spectrogram utilities, and training entrypoints for multi-GPU and multi-node setups. It provides emotional modeling through “emo embeddings,” allowing voices to be conditioned on different affective states during synthesis. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • $300 Free Credits to Build on Google Cloud Icon
    $300 Free Credits to Build on Google Cloud

    New to Google Cloud? Get $300 in credits to explore Compute Engine, BigQuery, Cloud Run, Gemini Enterprise Agent Platform, and more.

    Start your next project with $300 in free Google Cloud credit. Spin up VMs, run containers, query petabytes in BigQuery, or build agents with Gemini Enterprise Agent Platform. Once your credits are used, keep building with 20+ always-free tier products including Compute Engine, Cloud Storage, GKE, and Cloud Run functions. No commitment required—just sign up and start building.
    Claim $300 Free
  • 5
    towhee

    towhee

    Framework that is dedicated to making neural data processing

    ...From images to text to 3D molecular structures, Towhee supports data transformation for nearly 20 different unstructured data modalities. We provide end-to-end pipeline optimizations, covering everything from data decoding/encoding, to model inference, making your pipeline execution 10x faster. Towhee provides out-of-the-box integration with your favorite libraries, tools, and frameworks, making development quick and easy. Towhee includes a pythonic method-chaining API for describing custom data processing pipelines. We also support schemas, making processing unstructured data as easy as handling tabular data.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    ConsistencyDecoder

    ConsistencyDecoder

    Consistency Distilled Diff VAE

    ...Instead of relying solely on the standard GAN or VAE decoder, this approach leverages a Consistency Distilled Diff VAE, designed to produce higher-quality and more stable outputs from encoded latents. The project provides a simple API for encoding with a Stable Diffusion VAE and decoding using the new consistency model, allowing for side-by-side comparisons with traditional decoders. It demonstrates how consistency models can enhance visual fidelity while maintaining efficiency, reducing artifacts common in GAN-decoded outputs. The repository includes installation instructions, usage examples, and visual comparisons to highlight improvements. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 7
    Chinese-LLaMA-Alpaca-2 v2.0

    Chinese-LLaMA-Alpaca-2 v2.0

    Chinese LLaMA & Alpaca large language model + local CPU/GPU training

    This project has open-sourced the Chinese LLaMA model and the Alpaca large model with instruction fine-tuning to further promote the open research of large models in the Chinese NLP community. Based on the original LLaMA , these models expand the Chinese vocabulary and use Chinese data for secondary pre-training, which further improves the basic semantic understanding of Chinese. At the same time, the Chinese Alpaca model further uses Chinese instruction data for fine-tuning, which...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    minGPT

    minGPT

    A minimal PyTorch re-implementation of the OpenAI GPT

    minGPT is a minimalist, educational re-implementation of the GPT (Generative Pretrained Transformer) architecture built in PyTorch, designed by Andrej Karpathy to expose the core structure of a transformer-based language model in as few lines of code as possible. It strips away extraneous bells and whistles, aiming to show how a sequence of token indices is fed into a stack of transformer blocks and then decoded into the next token probabilities, with both training and inference supported....
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Auto-PyTorch

    Auto-PyTorch

    Automatic architecture search and hyperparameter optimization

    While early AutoML frameworks focused on optimizing traditional ML pipelines and their hyperparameters, another trend in AutoML is to focus on neural architecture search. To bring the best of these two worlds together, we developed Auto-PyTorch, which jointly and robustly optimizes the network architecture and the training hyperparameters to enable fully automated deep learning (AutoDL). Auto-PyTorch is mainly developed to support tabular data (classification, regression) and time series...
    Downloads: 0 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 10
    Alphafold2

    Alphafold2

    Unofficial Pytorch implementation / replication of Alphafold2

    ...Deepmind has open sourced the official code in Jax, along with the weights! This repository will now be geared towards a straight pytorch translation with some improvements on positional encoding. lhatsk has reported training a modified trunk of this repository, using the same setup as trRosetta, with competitive results. The underlying assumption is that the trunk works on the residue level, and then constitutes to atomic level for the structure module, whether it be SE3 Transformers, E(n)-Transformer, or EGNN doing the refinement.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11

    avio

    Python version of ffplay with built-in AI

    See the Files tab above for installation instructions
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    VideoSrt

    VideoSrt

    Windows-GUI

    ...Recognize video/audio speech to generate subtitle files (support Chinese-English translation, bilingual subtitles) Extract speech text from video/audio. Batch translation, filter processing/encoding SRT subtitle files. Using the Alibaba Cloud speech recognition interface, the accuracy is high, and the standard Mandarin/English recognition rate is over 95%. Video recognition does not need to upload the original video, which is convenient, fast and time-saving.
    Downloads: 15 This Week
    Last Update:
    See Project
  • 13
    OpenPrompt

    OpenPrompt

    An Open-Source Framework for Prompt-Learning

    ...In the future, we will also support PLMs implemented by other libraries. The template is one of the most important modules in prompt learning, which wraps the original input with textual or soft-encoding sequence. Use the implementations of current prompt-learning approaches.* We have implemented various of prompting methods, including templating, verbalizing and optimization strategies under a unified standard. You can easily call and understand these methods.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Kite

    Kite

    Primary Kite repo, private bits replaced with XXXXXXX

    ...However, we do have cloud instances & VMs available for running larger jobs and for testing our cloud services. We bundle a lot of pre-computed datasets & machine learning models into the Kite app through the use of a custom filemap & encoding on top of go-bindata. The data, located in kite-go/client/datadeps, is kept in Git-LFS.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 15
    Feature-engine

    Feature-engine

    Feature engineering package with sklearn like functionality

    Feature-engine is a Python library with multiple transformers to engineer and select features for use in machine learning models. Feature-engine's transformers follow Scikit-learn's functionality with fit() and transform() methods to learn the transforming parameters from the data and then transform it.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16

    Face recognition with mask

    Face recognition with mask

    戴口罩也變識得出的face recognition 將大頭照放images 下, 用人名命名 主畫面,點選encoding,將人臉特徵編碼 就可以在即時的webcam畫面看到便識結果 Face recognition that can be learned by wearing a mask Put the photo under images and name it Main screen, click encoding to encode facial features You can see the result of the recognition on the real-time webcam screen
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    YouTokenToMe

    YouTokenToMe

    Unsupervised text tokenizer focused on computational efficiency

    YouTokenToMe is a fast and efficient unsupervised text tokenization library designed for training subword embeddings, particularly useful for NLP models.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Texar

    Texar

    Toolkit for Machine Learning, Natural Language Processing

    ...Texar-TensorFlow (this repo) and Texar-PyTorch have mostly the same interfaces. Both further combine the best design of TF and PyTorch. Rich Pre-trained Models, Rich Usage with Uniform Interfaces. BERT, GPT2, XLNet, etc, for encoding, classification, generation, and composing complex models with other Texar components!
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    automl-gs

    automl-gs

    Provide an input CSV and a target field to predict, generate a model

    Give an input CSV file and a target field you want to predict to automl-gs, and get a trained high-performing machine learning or deep learning model plus native Python code pipelines allowing you to integrate that model into any prediction workflow. No black box: you can see exactly how the data is processed, and how the model is constructed, and you can make tweaks as necessary. automl-gs is an AutoML tool which, unlike Microsoft's NNI, Uber's Ludwig, and TPOT, offers a zero code/model...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Seq2seq Chatbot for Keras

    Seq2seq Chatbot for Keras

    This repository contains a new generative model of chatbot

    ...The architecture presented here assumes the same prior distributions for input and output words. Therefore, it shares an embedding layer (Glove pre-trained word embedding) between the encoding and decoding processes through the adoption of a new model.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Hunspell is a spell checker and morphological analyzer library and program designed for languages with rich morphology and complex compounding or character encoding. Hunspell interfaces: Curses, Ispell compatible pipe interface, OpenOffice.org UNO module
    Leader badge
    Downloads: 277 This Week
    Last Update:
    See Project
  • 22

    Darkbot

    The IRC's Talking Robot

    [ Please read https://sourceforge.net/p/darkbot/news/2014/01/darkbots-revitalization/ ] Darkbot is a portable IRC chat robot written in the C language that can be taught responses to user inquiries, and even have conversations with them. Darkbot was originally created by Jason Hamilton as an aid for help channels on Intenet Relay Chat.
    Leader badge
    Downloads: 8 This Week
    Last Update:
    See Project
  • 23
    CRFSharp

    CRFSharp

    CRFSharp is a .NET(C#) implementation of Conditional Random Field

    ...CRF#'s mainly algorithm is the same as CRF++ written by Taku Kudo. It encodes model parameters by L-BFGS. Moreover, it has many significant improvement than CRF++, such as totally parallel encoding, optimizing memory usage and so on. Currently, when training corpus, compared with CRF++, CRF# can make full use of multi-core CPUs and only uses very low memory, and memory grow is very smoothly and slowly while amount of training corpus, tags increase. with multi-threads process, CRF# is more suitable for large data and tags training than CRF++ now. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24

    EZvolve Foundation Classes

    Data types and utility classes for use with evolutionary algorithms.

    EZvolve Foundation Classes is a set of data types and utility classes for use with evolutionary algorithms. Currently implemented support for bit string encoding, populations, fitnesses, fitness functions, probabilities, and probability vectors.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Concrete Voice is a text to speech program. It can read the time, anounce weather, read text file, save text files to audio files, open any text file (supports all text encoding formats) and many more advance stuff!
    Downloads: 0 This Week
    Last Update:
    See Project
Auth0 Logo