Showing 3 open source projects for "corpora"

View related business solutions
  • Auth0 B2B Essentials: SSO, MFA, and RBAC Built In Icon
    Auth0 B2B Essentials: SSO, MFA, and RBAC Built In

    Unlimited organizations, 3 enterprise SSO connections, role-based access control, and pro MFA included. Dev and prod tenants out of the box.

    Auth0's B2B Essentials plan gives you everything you need to ship secure multi-tenant apps. Unlimited orgs, enterprise SSO, RBAC, audit log streaming, and higher auth and API limits included. Add on M2M tokens, enterprise MFA, or additional SSO connections as you scale.
    Sign Up Free
  • Earn up to 16% annual interest with Nexo. Icon
    Earn up to 16% annual interest with Nexo.

    More flexibility. More control.

    Generate interest, access liquidity without selling, and execute trades seamlessly. All in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • 1
    LLM Datasets

    LLM Datasets

    Curated list of datasets and tools for post-training

    ...The repository aims to make datasets easy to inspect and transform, with scripts for downloading, deduping, cleaning, and converting to formats like JSONL that slot into training pipelines. It highlights instruction-tuning and conversation-style corpora while also pointing to code, math, or domain-specific sets for targeted capabilities. Quality is a recurring theme: examples and utilities help filter low-value samples, enforce length limits, and split train/validation consistently so results are comparable. Licensing and provenance are surfaced to encourage compliant usage and to guide dataset selection in commercial settings. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 2
    Chinese Poetry

    Chinese Poetry

    The most comprehensive database of Chinese poetry

    This repository is a curated collection of Chinese poems and poets organized into catalogs, metadata, and text representations suitable for research, creative and cultural use. It includes major dynastic corpora, such as Tang and Song poems, as well as biographical and categorization data. Each poem entry is structured with fields like author, dynasty, title, content, and sometimes annotations or alternate versions. Developers and scholars can build tools that query by author, era, keyword, or poetic form using the standardized data structure. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    TextBlob

    TextBlob

    TextBlob is a Python library for processing textual data

    ...Also, it comes with a WordNet integration. If you only intend to use TextBlob’s default models (no model overrides), you can pass the lite argument. This downloads only those corpora needed for basic functionality. TextBlob is also available as a conda package.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next