Showing 36 open source projects for "tokenizer"

View related business solutions
  • Securden Privileged Account Manager Icon
    Securden Privileged Account Manager

    Unified Privileged Access Management

    Discover and manage administrator, service, and web app passwords, keys, and identities. Automate management with approval workflows. Centrally control, audit, monitor, and record all access to critical IT assets.
  • Digital Payments by Deluxe Payment Exchange Icon
    Digital Payments by Deluxe Payment Exchange

    A single integrated payables solution that takes manual payment processes out of the equation, helping reduce risk and cutting costs for your business

    Save time, money and your sanity. Deluxe Payment Exchange+ (DPX+) is our integrated payments solution that streamlines and automates your accounts payable (AP) disbursements. DPX+ ensures secure payments and offers suppliers alternate ways to receive funds, including mailed checks, ACH, virtual credit cards, debit cards, or eCheck payments. By simply integrating with your existing accounting software like QuickBooks®, you’ll implement efficient payment solutions for AP with ease—without costly development fees or untimely delays.
  • 1
    Tokenizer

    Tokenizer

    A small library for converting tokenized PHP source code into XML

    A small library for converting tokenized PHP source code into XML. You can add this library as a local, per-project dependency to your project using Composer. If you only need this library during development, for instance to run your project's test suite, then you should add it as a development-time dependency.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    llm

    llm

    An ecosystem of Rust libraries for working with large language models

    llm is an ecosystem of Rust libraries for working with large language models - it's built on top of the fast, efficient GGML library for machine learning. The primary entry point for developers is the llm crate, which wraps the llm-base and the supported model crates. Documentation for the released version is available on Docs.rs. For end-users, there is a CLI application, llm-cli, which provides a convenient interface for interacting with supported models. Text generation can be done as a...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 3
    IK Analysis for Elasticsearch

    IK Analysis for Elasticsearch

    A plugin that integrates Lucene IK analyzer into elasticsearch

    ..., independent of the Lucene project, and at the same time provides a default optimized implementation of Lucene. In the 2012 version, IK implemented a simple word segmentation ambiguity elimination algorithm, marking the evolution of the IK tokenizer from pure dictionary word segmentation to analog semantic word segmentation.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    Bumblebee

    Bumblebee

    Pre-trained Neural Network models in Axon

    Bumblebee provides pre-trained Neural Network models on top of Axon. It includes integration with Models, allowing anyone to download and perform Machine Learning tasks with few lines of code. The best way to get started with Bumblebee is with Livebook. Our announcement video shows how to use Livebook's Smart Cells to perform different Neural Network tasks with a few clicks. You can then tweak the code and deploy it. First, add Bumblebee and EXLA as dependencies in your mix.exs. EXLA is an...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Cloud data warehouse to power your data-driven innovation Icon
    Cloud data warehouse to power your data-driven innovation

    BigQuery is a serverless and cost-effective enterprise data warehouse that works across clouds and scales with your data.

    BigQuery Studio provides a single, unified interface for all data practitioners of various coding skills to simplify analytics workflows from data ingestion and preparation to data exploration and visualization to ML model creation and use. It also allows you to use simple SQL to access Vertex AI foundational models directly inside BigQuery for text processing tasks, such as sentiment analysis, entity extraction, and many more without having to deal with specialized models.
  • 5
    torchtext

    torchtext

    Data loaders and abstractions for text and NLP

    We recommend Anaconda as a Python package management system. Please refer to pytorch.org for the details of PyTorch installation. LTS versions are distributed through a different channel than the other versioned releases. Alternatively, you might want to use the Moses tokenizer port in SacreMoses (split from NLTK). You have to install SacreMoses. To build torchtext from source, you need git, CMake and C++11 compiler such as g++. When building from source, make sure that you have the same C...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    LaMDA-pytorch

    LaMDA-pytorch

    Open-source pre-training implementation of Google's LaMDA in PyTorch

    Open-source pre-training implementation of Google's LaMDA research paper in PyTorch. The totally not sentient AI. This repository will cover the 2B parameter implementation of the pre-training architecture as that is likely what most can afford to train. You can review Google's latest blog post from 2022 which details LaMDA here. You can also view their previous blog post from 2021 on the model.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Tensorflow Transformers

    Tensorflow Transformers

    State of the art faster Transformer with Tensorflow 2.0

    ... speech recognition and audio classification. Faster AutoReggressive Decoding, TFlite support, creating TFRecords is simple. Auto-Batching tf.data.dataset or tf.ragged tensors. Everything is dictionary (inputs and outputs) Multiple mask modes like causal, user-defined, prefix. tensorflow-text tokenizer support. Supports GPU, TPU, multi-GPU trainer with wandb, multiple callbacks, auto tensorboard.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    RE/flex lexical analyzer generator

    RE/flex lexical analyzer generator

    The regex-centric, fast lexical analyzer generator for C++

    RE/flex is the fast lexical analyzer generator (faster than Flex) with full Unicode support, indent/nodent/dedent anchors, lazy quantifiers, and many other modern features. Accepts Flex lexer specification syntax and is compatible with Bison/Yacc parsers. Generates reusable source code that is easy to understand. Supports fast scanning of UTF-8/16/32 files, strings, and streams. The reflex scanner generator tool generates clean lexer class code that is thread-safe. Generates Graphviz files...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 9
    GPT Neo

    GPT Neo

    An implementation of model parallel GPT-2 and GPT-3-style models

    An implementation of model & data parallel GPT3-like models using the mesh-tensorflow library. If you're just here to play with our pre-trained models, we strongly recommend you try out the HuggingFace Transformer integration. Training and inference is officially supported on TPU and should work on GPU as well. This repository will be (mostly) archived as we move focus to our GPU-specific repo, GPT-NeoX. NB, while neo can technically run a training step at 200B+ parameters, it is very...
    Downloads: 10 This Week
    Last Update:
    See Project
  • Secure Online Fax and Business Text Messaging Service Icon
    Secure Online Fax and Business Text Messaging Service

    Elevate your business communications with Notifyre's secure SMS and fax solutions.

    Send and receive SMS and fax online, from email, app or with our developer friendly SMS & fax API. HIPAA compliant & ISO 27001 certified. Outstanding value and 5-star service.
  • 10
    GPT2 for Multiple Languages

    GPT2 for Multiple Languages

    GPT2 for Multiple Languages, including pretrained models

    With just 2 clicks (not including Colab auth process), the 1.5B pretrained Chinese model demo is ready to go. The contents in this repository are for academic research purpose, and we do not provide any conclusive remarks. Research supported with Cloud TPUs from Google's TensorFlow Research Cloud (TFRC) Simplifed GPT2 train scripts(based on Grover, supporting TPUs). Ported bert tokenizer, multilingual corpus compatible. 1.5B GPT2 pretrained Chinese model (~15G corpus, 10w steps). Batteries...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    javalang

    javalang

    Pure Python Java parser and tools

    javalang is a pure Python library for working with Java source code. javalang provides a lexer and parser targeting Java 8. The implementation is based on the Java language spec.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Ganja.js

    Ganja.js

    Javascript Geometric Algebra Generator for Javascript, c++

    ... is a code generator producing classes that reificate algebraic literals and expressions by using reflection, a built-in tokenizer and a simple AST translator to rewrite functions containing algebraic constructs to their procedural counterparts. ganja.js now has a nodejs based templated source generator that allows the creation of arbitrary algebras for C++, C#, python and rust. The generated code provides in a flat multivector format and operator overloading.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13

    cyqlite

    enhanced SQLite

    100% Upwards compatible variant of SQLite. Provides win32/win64 versions of sqlite3.dll, which work better (smaller/faster/longer paths) than the dll's provided by sqlite.org.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 14
    WikiSQL

    WikiSQL

    A large annotated semantic parsing corpus for developing NL interfaces

    A large crowd-sourced dataset for developing natural language interfaces for relational databases. WikiSQL is the dataset released along with our work Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning. Regarding tokenization and Stanza, when WikiSQL was written 3-years ago, it relied on Stanza, a CoreNLP python wrapper that has since been deprecated. If you'd still like to use the tokenizer, please use the docker image. We do not anticipate switching...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 15
    We have implemented a core summarizer of scientific articles written in Spanish, with the following components: a tokenizer, a grammar checker, a clarity checker, a cohesion-coherence checker, a common-topic extractor and an output formatter.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16

    Persianp Toolbox

    A toolbox for Persian texts preprocessing

    A toolbox for preprocessing Persian texts including: Normalizer Tokenizer Sentencizer POS tagger Lemmatizer Stopword detector For more information please visit: www.persianp.ir/toolbox.html
    Downloads: 1 This Week
    Last Update:
    See Project
  • 17

    tokenizer

    Transforms arithmetic expressions (cstrings) into a sequence of tokens

    A c-string that represents an arithmetic expression ist transformed into a sequence of tokens ( functions, constants, variables, operators, brackets, commas ) and stored on a stack.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18

    slug_dev

    developing plattform for SLUG projects

    This is the plattform for the developers of SLUG (solr and lucene user group). Here we are hosting our projects related to solr and lucene. log4jSolr - log4j appender (and more) to index all log events solr_core - solr analysis extensions such as filters oder tokenizer
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    BASIC interpreter for the 16bit PIC microcontroller 24FJ64GA002. The interpreter runs on the chip only, no compiler/tokenizer is needed. Communication with PC is done by USB-to-serial converter cable.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 20
    Similar to StringTokenizer already in the Java standard library, but capable of discerning different types of data as tokens (e.g. "a=1" will yield 3 tokens) as it doesn't rely on whitespaces or some single delimiter (as StringTokenizer does).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    EWBTokenFactory for PHP is a whitespace inclusive tokenizer. Most tokenizers will grab each word as a token and drop the whitespace. This tokenizer will also tokenize the spaces in between into their own tokens. Also supports associated token data.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Stamp Basic compiler based on the tokenizer shared library (provided by Parallax Inc., Rocklin, CA USA.) for the Linux platform.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    This is my attempt to write a very simple parser in C++ in my (very infrequent) free time. Please ignore the tokenizer as I cheated a bit.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    NLPTools-ES is a Spanish plugin for GATE (General Architecture for Text Engineering). It includes a tokenizer, sentence splitter, gazetteer, pos tagger.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    The tokenization and segmentation for the Czech language.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next