Showing 144 open source projects for "text encoding"

View related business solutions
  • Build Securely on AWS with Proven Frameworks Icon
    Build Securely on AWS with Proven Frameworks

    Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

    Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.
    Download Now
  • Go From AI Idea to AI App Fast Icon
    Go From AI Idea to AI App Fast

    One platform to build, fine-tune, and deploy ML models. No MLOps team required.

    Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
    Try Free
  • 1
    minbpe

    minbpe

    Minimal, clean code for the Byte Pair Encoding (BPE) algorithm

    minbpe is a minimal, clean implementation of byte-level Byte Pair Encoding (BPE), the tokenization approach widely used in modern language models. It operates on UTF-8 encoded bytes rather than Unicode characters, which makes it robust to arbitrary text inputs and avoids needing a language-specific character vocabulary. The repository is structured as a teaching-oriented implementation that shows how to train a tokenizer by learning merge rules, then apply those merges to encode text into token IDs and decode tokens back into text.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Mini QR

    Mini QR

    Create & scan cute qr codes easily

    Mini QR is a web app focused on making QR codes feel friendly and design-forward, combining a polished QR generator with a built-in scanner so you can both create and decode codes in the same place. It emphasizes customization so the QR you generate can match a brand, event theme, or personal style, including color and styling controls, framed layouts with labels, and the ability to add a logo image. Because QR reliability matters as much as looks, it exposes practical settings like error...
    Downloads: 13 This Week
    Last Update:
    See Project
  • 3
    AI YouTube Shorts Generator

    AI YouTube Shorts Generator

    A python tool that uses GPT-4, FFmpeg, and OpenCV

    AI-YouTube-Shorts-Generator is a Python-based tool that automates the creation of short-form vertical video clips (“shorts”) from longer source videos — ideal for adapting content for platforms like YouTube Shorts, Instagram Reels, or TikTok. It analyzes input video (whether a local file or a YouTube URL), transcribes audio (with optional GPU-accelerated speech-to-text), uses an AI model to identify the most compelling or engaging segments, and then crops/resizes the video and applies...
    Downloads: 11 This Week
    Last Update:
    See Project
  • 4
    Step-Audio 2

    Step-Audio 2

    Multi-modal large language model designed for audio understanding

    Step-Audio2 is an advanced, end-to-end multimodal large language model designed for high-fidelity audio understanding and natural speech conversation: unlike many pipelines that separate speech recognition, processing, and synthesis, Step-Audio2 processes raw audio, reasons about semantic and paralinguistic content (like emotion, speaker characteristics, non-verbal cues), and can generate contextually appropriate responses — including potentially generating or transforming audio output. It...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Forever Free Full-Stack Observability | Grafana Cloud Icon
    Forever Free Full-Stack Observability | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 5
    Chinese-LLaMA-Alpaca-3

    Chinese-LLaMA-Alpaca-3

    Chinese Llama-3 LLMs) developed from Meta Llama 3

    Chinese-LLaMA-Alpaca-3 is an open-source project that provides Mandarin-focused large language models based on Meta’s LLaMA-3 architecture, with both foundational and instruction-tuned variants to support high-quality Chinese natural language understanding and generation. It extends the original LLaMA models with expanded Chinese vocabularies and additional pretraining on Chinese corpora to improve semantic encoding and decoding specifically for Chinese text. Alongside the base models, the project also releases Chinese Alpaca models that are fine-tuned on instruction datasets so they behave more like conversational and instruction-following AI assistants. It includes scripts and tooling that let researchers or developers run training, fine-tuning, quantization, and deployment on local machines (CPU or GPU), making experimentation and testing accessible without requiring large clusters.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Apiato

    Apiato

    PHP Framework for building scalable API's on top of Laravel

    ...Authentication with OAuth2.0 for first/third-party clients (using Laravel Passport). Role-Based Access Control (RBAC), seeded with a Super Admin, Roles, and Permissions. Query Parameters support (orderBy, sorted, and filter) with full-text search. Useful Endpoints for managing users, roles/permissions, tokens, and more. API Documentations generator, to generate API docs from PHP Docblock using ApiDocJS (provided by Documentation Container). Supports CORS (Cross-Origin Resource Sharing) and JSONP (JSON with padding). Auto encoding/decoding of real IDs, to prevent exposing real ids to the outer world. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    thextedit
    A simple text/hex editor focused on encoding matters. Edit characters, BOM, EOL at raw byte level. Convert files from/to any text encoding.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    biber
    Biber is a sophisticated bibliography processing backend for the LaTeX biblatex package. It supports a unsurpassed feature set for automated conformance to complex bibliography style requirements such as labelling, sorting and name handling. It has comprehensive Unicode support.
    Leader badge
    Downloads: 288 This Week
    Last Update:
    See Project
  • 9
    BinEd Binary/Hex Editor

    BinEd Binary/Hex Editor

    Binary / hex editor and component written in Java

    Free and open source hex editor written in Java. This is standalone desktop app, library for Java applications as well as variants for Java IDEs are also available.
    Downloads: 22 This Week
    Last Update:
    See Project
  • Build Securely on Azure with Proven Frameworks Icon
    Build Securely on Azure with Proven Frameworks

    Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

    Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.
    Download Now
  • 10

    multinotes

    Text architecture for music theory.

    ...Furthermore, dynamic interactive documents can be useful for presenting complicated interdependencies to the reader more clearly, far beyond conventional paper publication. The mulitNotes text architecture and processing pipeline is based on d2d and standard technologies (XSLT, ECMAScript. LilyPond, PostScript, etc.) and addresses these issues. An overview about the software architecture and its operation is given in: Journal of the Text Encoding Initiative, Open Issue 18/2024: "Using d2d for Writing XML --- The multiNotes Text Architecture for Musical Analysis" https://doi.org/10.4000/132ex
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Basecy

    Basecy

    A tool for encoding/decoding to basecy

    Encode/Decode files to UTF-8 text filled with printable ASCII and Cyrillic characters.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    jPicEdt

    jPicEdt

    Another drawing editor for LaTeX with PSTricks & TikZ

    jPicEdt is an extensible internationalized vector-based drawing editor for LaTeX and related packages (TikZ, PsTricks,...), written in Java. It is also a library of reusable high-level graphic primitives.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 13

    Secure Protocol Format

    Generic binary protocol library that prevents injection attacks

    ...Guaranteeing equivalence in data interpretation, known as operational congruity, is achieved by separating fields of data on the basis of their length. When the length of the data is known, there is no risk of misinterpreting it on the basis of spaces or text delimiters. The Distinguished Encoding Rules, or DER, of the ASN.1 standard follows this approach but includes numerous constraints and, more importantly, demands that data fields to be described using binary metadata rather than text. The Secure Protocol Format, or SPF, was created as a simplified version of DER. In addition to delimiting data by length, it also affords programmers the ability to use text for describing data, just like tags are used in HTML and XML. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    NextTypes

    NextTypes

    Information storage, processing and transmission system

    NextTypes is a standards based information storage, processing and transmission system that integrates the characteristics of other systems such as databases, programming languages, communication protocols, file systems, document managers, operating systems, frameworks, file formats and hardware in a single tightly integrated system using a common data types system. NextTypes uses PostgreSQL as its data store. Its advanced features as complete SQL support, transactional DDL, deferrable constraints, composite types and full-text search allow building a very complete data system with a high level of abstraction. NextTypes is a relational/network/objects/files hybrid database system with high level SQL interface, extensive primitive types list, JSON/JSON-LD/XML/Smile/WebDAV/CalDAV/iCalendar/RSS data access, REST interface, customizable MVC architecture, optimistic concurrency control, HTML5/CSS3/SVG/Javascript responsive graphical interface, multilanguage and UTF-8 encoding.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Shutter Encoder

    Shutter Encoder

    Free professional video converter Windows|Mac|Linux

    Shutter Encoder is an video, audio and image converter based on FFmpeg and other great tools. It has been designed by video editors in order to be as accessible and efficient as possible. It's a swiss knife tool for any video editor. Link to website & downloads : https://www.shutterencoder.com - Without conversion: Cut without re-encoding, Replace audio, Rewrap, Conform, Merge, Extract, Subtitling, Video inserts - Sound conversions: WAV, AIFF, FLAC, ALAC, MP3, AAC, AC3,...
    Leader badge
    Downloads: 133 This Week
    Last Update:
    See Project
  • 16
    BWTC32Key

    BWTC32Key

    A file compressor with AES256CTR and Base32768 binary-to-text encoding

    BWTC32Key is a program I wrote that compresses data, then optionally encrypts it, and then outputs a Base32768 representation as the final output
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17

    TEI LingSIG

    Production space for the TEI Linguistics SIG

    This used to be the experimentation and production space for the Special Interest Group (SIG) of the Text Encoding Initiative (TEI) called "TEI for Linguists", LingSIG for short. Currently, this is a storage place for documents produced by the SIG. Use https://github.com/LingSIG to access the current production space.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Bert-VITS2

    Bert-VITS2

    VITS2 backbone with multilingual-bert

    Bert-VITS2 is a neural text-to-speech project that combines a VITS2 backbone with a multilingual BERT front-end to produce high-quality speech in multiple languages. The core idea is to use BERT-style contextual embeddings for text encoding while relying on a refined VITS2 architecture for acoustic generation and vocoding. The repository includes everything needed to train, fine-tune, and run the model, from configuration files to preprocessing scripts, spectrogram utilities, and training entrypoints for multi-GPU and multi-node setups. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    GPT-2

    GPT-2

    Code for the paper Language Models are Unsupervised Multitask Learners

    This repository contains the code and model weights for GPT-2, a large-scale unsupervised language model described in the OpenAI paper “Language Models are Unsupervised Multitask Learners.” The intent is to provide a starting point for researchers and engineers to experiment with GPT-2: generate text, fine‐tune on custom datasets, explore model behavior, or study its internal phenomena. The repository includes scripts for sampling, training, downloading pre-trained models, and utilities for...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 20

    Change File Encoding

    Change encoding of text files.

    Change File Encoding is a utility that allows you to change the encoding of text files. For example, files saved in US-ASCII can be converted to UTF-8. Over 170 encodings are supported. Requires Java 1.8 or higher.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 21
    towhee

    towhee

    Framework that is dedicated to making neural data processing

    Towhee is an open-source machine-learning pipeline that helps you encode your unstructured data into embeddings. You can use our Python API to build a prototype of your pipeline and use Towhee to automatically optimize it for production-ready environments. From images to text to 3D molecular structures, Towhee supports data transformation for nearly 20 different unstructured data modalities. We provide end-to-end pipeline optimizations, covering everything from data decoding/encoding, to model inference, making your pipeline execution 10x faster. Towhee provides out-of-the-box integration with your favorite libraries, tools, and frameworks, making development quick and easy. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    wxMEdit

    wxMEdit

    wxMEdit, Cross-platform Text/Hex Editor, Improved Version of MadEdit

    •Added automatically checking for updates •Added bookmark support •Added right-click context menu for each tab •Added purging histories support •Added selecting a line by triple click •Added FreeBASIC syntax file •Added an option to place configuration files into %APPDATA% directory under Windows •Improved support for Find/Replace •Improved Mac OS X support •Improved system integration under Windows •Improved encoding detection result •Improved Hex editing support •Added more...
    Leader badge
    Downloads: 158 This Week
    Last Update:
    See Project
  • 23
    jbig2enc

    jbig2enc

    JBIG2 Encoder

    ...JBIG2 encodes bi-level (1 bpp) images using a number of clever tricks to get better compression than G4. This encoder can: Generate JBIG2 files, or fragments for embedding in PDFs Generic region encoding Perform symbol extraction, classification and text region coding Perform refinement coding and, Compress multipage documents It uses the (Apache-ish licensed) Leptonica library: http://leptonica.com/
    Downloads: 21 This Week
    Last Update:
    See Project
  • 24
    MATTA

    MATTA

    Morse Code Utilitiies to convert text messages to & from sound files.

    This is a commandline utility that converts a WAV sound file containing morse code to English text. Pre-built binaries run on OSX, MsWindows, & GNU/linux. It is written in Ada, so can be rebuilt on any platform with an Ada compiler. The input wav file must be monaural, with a 16-bit signed integer encoding, and a sample rate of 8000 Hz. Either sox or audacity can easily transform to this format. The wav file is expected to be international morse code, preferrably clean and properly spaced. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    UUE

    UUE

    UUE is a tool that converts files to and from uuencoding

    uuencoding is a form of binary-to-text encoding that originated in the Unix programs uuencode and uudecode written by Mary Ann Horton at UC Berkeley in 1980, for encoding binary data for transmission in email systems. This project is licensed under the ISC License. Copyright © 2020-2022 ALBANESE Research Lab Source code: https://github.com/pedroalbanese/uuencode Visit: http://albanese.atwebpages.com
    Downloads: 3 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB