3 projects for "corpora" with 2 filters applied:

  • Stop Storing Third-Party Tokens in Your Database Icon
    Stop Storing Third-Party Tokens in Your Database

    Auth0 Token Vault handles secure token storage, exchange, and refresh for external providers so you don't have to build it yourself.

    Rolling your own OAuth token storage can be a security liability. Token Vault securely stores access and refresh tokens from federated providers and handles exchange and renewal automatically. Connected accounts, refresh exchange, and privileged worker flows included.
    Try Auth0 for Free
  • Secure File Transfer for Windows with Cerberus by Redwood Icon
    Secure File Transfer for Windows with Cerberus by Redwood

    Protect and share files over FTP/S, SFTP, HTTPS and SCP with the #1 rated Windows file transfer server.

    Cerberus supports unlimited users and connections on a single IP, with built-in encryption, 2FA, and a browser-based web client — all deployable in under 15 minutes with a 25-day free trial.
    Try for Free
  • 1
    LLM Datasets

    LLM Datasets

    Curated list of datasets and tools for post-training

    ...The repository aims to make datasets easy to inspect and transform, with scripts for downloading, deduping, cleaning, and converting to formats like JSONL that slot into training pipelines. It highlights instruction-tuning and conversation-style corpora while also pointing to code, math, or domain-specific sets for targeted capabilities. Quality is a recurring theme: examples and utilities help filter low-value samples, enforce length limits, and split train/validation consistently so results are comparable. Licensing and provenance are surfaced to encourage compliant usage and to guide dataset selection in commercial settings. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 2
    concordia

    concordia

    Powerful search library, best suited for computer-aided translation

    ...This project now contains fully functional Concordia search library. In the near future, it will be extended by concordia-server: ligthweight, robust web server providing corpora search functionalities
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Chinese Poetry

    Chinese Poetry

    The most comprehensive database of Chinese poetry

    This repository is a curated collection of Chinese poems and poets organized into catalogs, metadata, and text representations suitable for research, creative and cultural use. It includes major dynastic corpora, such as Tang and Song poems, as well as biographical and categorization data. Each poem entry is structured with fields like author, dynasty, title, content, and sometimes annotations or alternate versions. Developers and scholars can build tools that query by author, era, keyword, or poetic form using the standardized data structure. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB