Showing 8 open source projects for "tokenizer"

View related business solutions
  • Cybersecurity Management Software for MSPs Icon
    Cybersecurity Management Software for MSPs

    Secure your clients from cyber threats.

    Define and Deliver Comprehensive Cybersecurity Services. Security threats continue to grow, and your clients are most likely at risk. Small- to medium-sized businesses (SMBs) are targeted by 64% of all cyberattacks, and 62% of them admit lacking in-house expertise to deal with security issues. Now technology solution providers (TSPs) are a prime target. Enter ConnectWise Cybersecurity Management (formerly ConnectWise Fortify) — the advanced cybersecurity solution you need to deliver the managed detection and response protection your clients require. Whether you’re talking to prospects or clients, we provide you with the right insights and data to support your cybersecurity conversation. From client-facing reports to technical guidance, we reduce the noise by guiding you through what’s really needed to demonstrate the value of enhanced strategy.
  • Cloudflare secures and ensures the reliability of your external-facing resources such as websites, APIs, and applications. Icon
    It protects your internal resources such as behind-the-firewall applications, teams, and devices.
  • 1
    SentencePiece

    SentencePiece

    Unsupervised text tokenizer for Neural Network-based text generation

    SentencePiece is an unsupervised text tokenizer and detokenizer mainly for Neural Network-based text generation systems where the vocabulary size is predetermined prior to the neural model training. SentencePiece implements subword units (e.g., byte-pair-encoding (BPE) [Sennrich et al.]) and unigram language model [Kudo.]) with the extension of direct training from raw sentences. SentencePiece allows us to make a purely end-to-end system that does not depend on language-specific pre...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    RE/flex lexical analyzer generator

    RE/flex lexical analyzer generator

    The regex-centric, fast lexical analyzer generator for C++

    RE/flex is the fast lexical analyzer generator (faster than Flex) with full Unicode support, indent/nodent/dedent anchors, lazy quantifiers, and many other modern features. Accepts Flex lexer specification syntax and is compatible with Bison/Yacc parsers. Generates reusable source code that is easy to understand. Supports fast scanning of UTF-8/16/32 files, strings, and streams. The reflex scanner generator tool generates clean lexer class code that is thread-safe. Generates Graphviz files...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 3
    The C++ String Toolkit Library (StrTk) consists of robust, optimized and portable string processing algorithms for the C++ language. StrTk is designed to be easy to use and integrate within existing code bases. http://strtk.partow.net
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4

    flex: the fast lexical analyser

    flex is a tool for generating scanners

    flex is a tool for generating scanners. A scanner, sometimes called a tokenizer, is a program which recognizes lexical patterns in text. The flex program reads user-specified input files, or its standard input if no file names are given, for a description of a scanner to generate. The description is in the form of pairs of regular expressions and C code, called rules. Flex generates a C source file named, "lex.yy.c", which defines the function yylex(). The file "lex.yy.c" can be compiled...
    Leader badge
    Downloads: 2,961 This Week
    Last Update:
    See Project
  • Field Service Management Software | BlueFolder Icon
    Field Service Management Software | BlueFolder

    Maximize technician productivity with intuitive field service software

    Track all your service data in one easy-to-use system, enabling your team to move faster and generate more revenue for your bottom line.
  • 5
    This is my attempt to write a very simple parser in C++ in my (very infrequent) free time. Please ignore the tokenizer as I cheated a bit.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    The tokenization and segmentation for the Czech language.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    a tokenizer that is general purpose, and multilanguage. it will provide consistent usage across the languages, and be tied into that language's unique common usage types.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    A run-time configurable character stream tokenizer that allows the user to define token classes via regular expressions. The developer is not limited to predefined notions of whitespace, commenting, or word modalities.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next