Showing 46 open source projects for "natural language processing"

View related business solutions
  • Atera - an All-in-one platform for IT management Icon
    Atera - an All-in-one platform for IT management

    Ideal for IT departments and MSPs (managed service providers)

    Your IT essentials, integrated & elevated. Take your IT management from automated to autonomous, download Atera's agent to start your free trial!
    Try Atera now
  • Build Agents and Models on One Platform Icon
    Build Agents and Models on One Platform

    Everything you need to build production-ready agents and models. Access 200+ Google and third-party AI models and tools.

    Gemini Enterprise Agent Platform is Google Cloud's comprehensive platform for developers to build, scale, govern, and optimize agents and models. Choose from Google's most advanced models and third-party models like Anthropic's Claude Model Family.
    Try It Free
  • 1
    Ruby

    Ruby

    Ruby programming language

    A dynamic, open source programming language with a focus on simplicity and productivity. It has an elegant syntax that is natural to read and easy to write. Ruby is a language of careful balance. Its creator, Yukihiro “Matz” Matsumoto, blended parts of his favorite languages (Perl, Smalltalk, Eiffel, Ada, and Lisp) to form a new language that balanced functional programming with imperative programming.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 2
    Crystal

    Crystal

    The crystal programming language

    Crystal’s syntax is heavily inspired by Ruby’s, so it feels natural to read and easy to write, and has the added benefit of a lower learning curve for experienced Ruby devs. Crystal is statically type checked, so any type errors will be caught early by the compiler rather than fail on runtime. Moreover, and to keep the language clean, Crystal has built-in type inference, so most type annotations are unneeded.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 3
    Miller

    Miller

    Miller is like awk, sed, cut, join, and sort for name-indexed data

    Miller is like awk, sed, cut, join, and sort for data formats such as CSV, TSV, JSON, JSON Lines, and positionally-indexed. With Miller, you get to use named fields without needing to count positional indices, using familiar formats such as CSV, TSV, JSON, JSON Lines, and positionally-indexed. Then, on the fly, you can add new fields which are functions of existing fields, drop fields, sort, aggregate statistically, pretty-print, and more. Miller operates on key-value-pair data while the...
    Downloads: 12 This Week
    Last Update:
    See Project
  • 4
    Elixir

    Elixir

    Dynamic, functional language designed for building scalable apps

    Elixir is a dynamic, functional language for building scalable and maintainable applications. Elixir leverages the Erlang VM, known for running low-latency, distributed, and fault-tolerant systems. Elixir is successfully used in web development, embedded software, data ingestion, and multimedia processing, across a wide range of industries. All Elixir code runs inside lightweight threads of execution (called processes) that are isolated and exchange information via messages. ...
    Downloads: 7 This Week
    Last Update:
    See Project
  • Auth0 B2B Essentials: SSO, MFA, and RBAC Built In Icon
    Auth0 B2B Essentials: SSO, MFA, and RBAC Built In

    Unlimited organizations, 3 enterprise SSO connections, role-based access control, and pro MFA included. Dev and prod tenants out of the box.

    Auth0's B2B Essentials plan gives you everything you need to ship secure multi-tenant apps. Unlimited orgs, enterprise SSO, RBAC, audit log streaming, and higher auth and API limits included. Add on M2M tokens, enterprise MFA, or additional SSO connections as you scale.
    Sign Up Free
  • 5
    League CSV

    League CSV

    CSV data manipulation made easy in PHP

    The PHP League CSV is a PHP library for reading, writing, and manipulating CSV files. It offers a straightforward API for handling common CSV operations, including parsing data, writing rows, and formatting output. The library is designed to handle large datasets efficiently, making it a reliable choice for data processing tasks in web applications.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 6
    Cocur Slugify

    Cocur Slugify

    Converts a string to a slug. Includes integrations for Symfony

    Slugify is a PHP library that converts strings into URL-friendly slugs. It replaces spaces and special characters with hyphens or other specified separators, making it ideal for generating SEO-friendly URLs. Slugify is lightweight, fast, and highly configurable, supporting custom rules and language-specific transliterations for accurate slug creation.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    Unredact

    Unredact

    A simple tool for reading in poorly redacted documents

    Unredact is a specialized tool that attempts to reconstruct redacted or obscured text in images, PDFs, or screenshots using a combination of image processing and generative AI inference to suggest plausible completions of blurred, black-boxed, or jumbled content. Unlike traditional optical character recognition (OCR), which only reads visible text, Unredact focuses on inferring missing content where redaction has been applied by analyzing surrounding context, font characteristics, and...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 8
    HCL

    HCL

    HCL is the HashiCorp configuration language

    ...It includes an expression syntax that allows basic inline computation and, with support from the calling application, the use of variables and functions for more dynamic configuration languages. HCL provides a set of constructs that can be used by a calling application to construct a configuration language. The application defines which attribute names and nested block types are expected, and HCL parses the configuration file, verifies that it conforms to the expected structure, and returns high-level objects that the application can use for further processing.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 9
    OpenDataLoader PDF

    OpenDataLoader PDF

    PDF Parser for AI-ready data. Automate PDF accessibility

    OpenDataLoader PDF is an open-source document processing system designed to convert complex PDF files into structured, AI-ready formats such as Markdown, JSON, and HTML while preserving layout, hierarchy, and semantic meaning. It focuses on enabling downstream use cases like retrieval-augmented generation (RAG), knowledge extraction, and document intelligence pipelines by maintaining accurate reading order and spatial metadata through bounding boxes. The tool combines deterministic parsing...
    Downloads: 13 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 10
    ant4docbook

    ant4docbook

    ANT4DOCBOOK is an ANT task for DOCBOOK

    ANT4DOCBOOK is an ANT task for DOCBOOK, a semantic markup language for technical documentation.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    TextureAtlas Toolbox

    TextureAtlas Toolbox

    A powerful, free and open-source tool for TextureAtlases/Spritesheets

    TextureAtlas Toolbox is an all-in-one solution for working with texture atlases and sprite sheets. Extract sprites into organized frame collections and GIF/WebP/APNG animations, generate optimized atlases from individual frames, or convert between 15+ atlas formats. Perfect for game developers, modders, and anyone creating showcases of game sprites. Formerly known as TextureAtlas to GIFs and Frames Licensed under AGPL-3.0 Third-party licenses: See...
    Leader badge
    Downloads: 34 This Week
    Last Update:
    See Project
  • 12
    SPyQL

    SPyQL

    Query data on the command line with SQL-like SELECTs powered by Python

    SQL with Python in the middle. SPyQL is a query language that combines the simplicity and structure of SQL with the power and readability of Python. SPyQL offers a command-line interface that allows running SPyQL queries on top of text data (e.g. CSV, JSON). Data can come from files but also from data streams, such as as Kafka, or from databases such as PostgreSQL. Basically, data can come from any command that outputs text :-). More, data can be generated by a Python expression! And since...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Budou

    Budou

    Budou is an auto organizer tool for beautiful line breaking in CJK

    ...These spans can be styled with CSS to ensure smooth, visually coherent line breaks without splitting words or phrases. The tool supports multiple segmentation backends, including Google Cloud Natural Language API, MeCab, and TinySegmenter, enabling flexibility for both cloud-based and offline processing. Budou can be used via command line, in Python scripts, or integrated into web applications, and it provides advanced options such as caching and entity recognition for improved segmentation accuracy.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 14
    ModularAdmin

    ModularAdmin

    Free Dashboard Theme Built On Bootstrap 4 | HTML Version

    ModularAdmin is an open source dashboard theme built in a modular way. That makes it easy to scale, modify and maintain. We use SASS as CSS preprocessor language. Main variables are defined in the src/_variables.scss folder. For making life easier we broke down styles into components, and on build we're just merging all .scss files together and processing it to the dist/css/app.css file. There are also different theme variations located in src/_themes/ folder, where you can change the main variables to get different themes. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    XSH is a powerfull command-line XML editing tool/programming language in the manner of Unix shell interpreters and line-oriented text editors like ed which can be used either interactively or for batch-mode XML processing.
    Leader badge
    Downloads: 1 This Week
    Last Update:
    See Project
  • 16
    unfluff

    unfluff

    Automatically extract body content (and other cool stuff) from HTML

    unfluff is a Node.js library designed to automatically extract the main content from an HTML document — stripping away navigation bars, ads, footers and other boilerplate to leave you with the “body content”, metadata (title, author, date) and other useful fields. It’s a tool very much aimed at content-analysis, web scraping, building datasets, or repurposing article text for downstream processing (like machine-learning or summarization). The API is simple: you feed in raw HTML and it returns a structured object with the extracted text and other fields. It supports caching internal representations to speed up repeated extractions. While its language support is best for English, it is still widely used in web-content-processing pipelines. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    iText®, a JAVA PDF library

    iText®, a JAVA PDF library

    PDF Library for Developers

    iText is an open-source PDF library available for Java and .NET (C#). iText allows you to effortlessly generate and manipulate standards-compliant PDF documents with a powerful and feature-rich SDK. With iText, you can create archivable and accessible PDFs, split and merge documents, fill and flatten forms, digitally sign documents, and more. iText add-ons enable additional functionality, such as PDF creation from HTML templates, secure redaction, OCR, and much more. The latest...
    Leader badge
    Downloads: 102 This Week
    Last Update:
    See Project
  • 18
    heml

    heml

    HEML is an open source markup language for building responsive email

    ...With dozens of popular email clients, each of which has its own quirks, it can be overwhelming to build an email that looks good and works well. Add in the challenge of getting your email to the inbox quickly, and it's enough to make anyone give up. HEML is a XML-based markup language designed for building emails. The goal is to make building emails feel as natural as building websites.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    WikiSQL

    WikiSQL

    A large annotated semantic parsing corpus for developing NL interfaces

    A large crowd-sourced dataset for developing natural language interfaces for relational databases. WikiSQL is the dataset released along with our work Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning. Regarding tokenization and Stanza, when WikiSQL was written 3-years ago, it relied on Stanza, a CoreNLP python wrapper that has since been deprecated.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Dictionary

    Dictionary

    A JSON representation of Webster's Unabridged Dictionary

    ...The repository also includes usage examples that demonstrate how to incorporate the module into JavaScript projects. As an open source utility, dictionary can be extended or customized to suit different natural language processing or educational applications.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 21
    Yes, finally. The author decide to schedule some of his busy time to release the "Chinese Language Formula"- Step By Step. It will take a while, but eventually Chinese Language Processing is no longer a problem.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    A java-based parser for parsing/grabbing web sites and other text or XML documents, based on a nondeterministic parser language, creating XML output. Also contains a few utility classes for HTML, CSV and text parsing, and additional character sets.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    writeup
    Programming language for converting source documents into HTML or XML. Writeup is a combination of a markup language (similar to markdown) and a macro pre-processing language that enables a formal production system to be set up for documents.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    IDL specified API for manipulating and processing CellML 1.0 and 1.1. Includes C++ implementation. Accessible from a C++ program, or from any language for which a CORBA language mapping is available Also, a Java wrapper of the API is available.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 25
    ...ROO makes the ontology building process easier as it provides guidance about the steps involved and it allows to enter knowledge using an easy to learn controlled natural language.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next
Auth0 Logo