Showing 164 open source projects for "transformers"

View related business solutions
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • Find Hidden Risks in Windows Task Scheduler Icon
    Find Hidden Risks in Windows Task Scheduler

    Free diagnostic script reveals configuration issues, error patterns, and security risks. Instant HTML report.

    Windows Task Scheduler might be hiding critical failures. Download the free JAMS diagnostic tool to uncover problems before they impact production—get a color-coded risk report with clear remediation steps in minutes.
    Download Free Tool
  • 1
    GLM-4.5-Air

    GLM-4.5-Air

    Compact hybrid reasoning language model for intelligent responses

    ...GLM-4.5-Air supports both English and Chinese, and is suitable for tasks involving text generation, coding, reasoning, and tool calling. Open-sourced under the MIT license, it is commercially usable and integrates with transformers, vLLM, and SGLang inference frameworks. It includes FP8 variants for faster inference and reduced memory requirements. Despite its smaller size compared to full GLM-4.5, GLM-4.5-Air maintains high performance.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    mms-300m-1130-forced-aligner

    mms-300m-1130-forced-aligner

    CTC-based forced aligner for audio-text in 158 languages

    mms-300m-1130-forced-aligner is a multilingual forced alignment model based on Meta’s MMS-300M wav2vec2 checkpoint, adapted for Hugging Face’s Transformers library. It supports forced alignment between audio and corresponding text across 158 languages, offering broad multilingual coverage. The model enables accurate word- or phoneme-level timestamping using Connectionist Temporal Classification (CTC) emissions. Unlike other tools, it provides significant memory efficiency compared to the TorchAudio forced alignment API. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    wav2vec2-large-xlsr-53-portuguese

    wav2vec2-large-xlsr-53-portuguese

    Portuguese ASR model fine-tuned on XLSR-53 for 16kHz audio input

    ...It achieves a WER of 11.3% (or 9.01% with LM) on Common Voice test data, demonstrating high accuracy for a single-language ASR model. Inference can be done using HuggingSound or via a custom PyTorch script using Hugging Face Transformers and Librosa. Training scripts and evaluation methods are open source and available on GitHub. It is released under the Apache 2.0 license and intended for ASR tasks in Brazilian Portuguese.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Bio_ClinicalBERT

    Bio_ClinicalBERT

    ClinicalBERT model trained on MIMIC notes for clinical NLP tasks

    ...Training was done for 150,000 steps using a batch size of 32, max sequence length of 128, and a masked language modeling objective with a 0.15 mask probability. Bio_ClinicalBERT is available through Hugging Face's Transformers library for easy integration. It supports medical AI research and applications involving electronic health record understanding, clinical decision support, and biomedical information extraction.
    Downloads: 0 This Week
    Last Update:
    See Project
  • AI-powered service management for IT and enterprise teams Icon
    AI-powered service management for IT and enterprise teams

    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.
    Try it Free
  • 5
    wav2vec2-large-xlsr-53-russian

    wav2vec2-large-xlsr-53-russian

    Russian ASR model fine-tuned on Common Voice and CSS10 datasets

    ...It achieves a Word Error Rate (WER) of 13.3% and Character Error Rate (CER) of 2.88% on the Common Voice test set, with even better results when used with a language model. The model supports both PyTorch and JAX and is compatible with the Hugging Face Transformers and HuggingSound libraries. It is ideal for Russian voice transcription tasks in research, accessibility, and interface development. The training was made possible with compute support from OVHcloud, and the training scripts are publicly available for replication.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    BLEURT-20-D12

    BLEURT-20-D12

    Custom BLEURT model for evaluating text similarity using PyTorch

    BLEURT-20-D12 is a PyTorch implementation of BLEURT, a model designed to assess the semantic similarity between two text sequences. It serves as an automatic evaluation metric for natural language generation tasks like summarization and translation. The model predicts a score indicating how similar a candidate sentence is to a reference sentence, with higher scores indicating greater semantic overlap. Unlike standard BLEURT models from TensorFlow, this version is built from a custom PyTorch...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    layoutlm-base-uncased

    layoutlm-base-uncased

    Multimodal Transformer for document image understanding and layout

    layoutlm-base-uncased is a multimodal transformer model developed by Microsoft for document image understanding tasks. It incorporates both text and layout (position) features to effectively process structured documents like forms, invoices, and receipts. This base version has 113 million parameters and is pre-trained on 11 million documents from the IIT-CDIP dataset. LayoutLM enables better performance in tasks where the spatial arrangement of text plays a crucial role. The model uses a...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    t5-base

    t5-base

    Flexible text-to-text transformer model for multilingual NLP tasks

    t5-base is a pre-trained transformer model from Google’s T5 (Text-To-Text Transfer Transformer) family that reframes all NLP tasks into a unified text-to-text format. With 220 million parameters, it can handle a wide range of tasks, including translation, summarization, question answering, and classification. Unlike traditional models like BERT, which output class labels or spans, T5 always generates text outputs. It was trained on the C4 dataset, along with a variety of supervised NLP...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Qwen2.5-VL-3B-Instruct

    Qwen2.5-VL-3B-Instruct

    Qwen2.5-VL-3B-Instruct: Multimodal model for chat, vision & video

    Qwen2.5-VL-3B-Instruct is a 3.75 billion parameter multimodal model by Qwen, designed to handle complex vision-language tasks in both image and video formats. As part of the Qwen2.5 series, it supports image-text-to-text generation with capabilities like chart reading, object localization, and structured data extraction. The model can serve as an intelligent visual agent capable of interacting with digital interfaces and understanding long-form videos by dynamically sampling resolution and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Zendesk: The Complete Customer Service Solution Icon
    Zendesk: The Complete Customer Service Solution

    Discover AI-powered, award-winning customer service software trusted by 200k customers

    Equip your agents with powerful AI tools and workflows that boost efficiency and elevate customer experiences across every channel.
    Learn More
  • 10
    Qwen2.5-VL-7B-Instruct

    Qwen2.5-VL-7B-Instruct

    Multimodal 7B model for image, video, and text understanding tasks

    Qwen2.5-VL-7B-Instruct is a multimodal vision-language model developed by the Qwen team, designed to handle text, images, and long videos with high precision. Fine-tuned from Qwen2.5-VL, this 7-billion-parameter model can interpret visual content such as charts, documents, and user interfaces, as well as recognize common objects. It supports complex tasks like visual question answering, localization with bounding boxes, and structured output generation from documents. The model is also...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    bge-base-en-v1.5

    bge-base-en-v1.5

    Efficient English embedding model for semantic search and retrieval

    ...With 768 embedding dimensions and a maximum sequence length of 512 tokens, it achieves strong performance across multiple MTEB benchmarks, nearly matching larger models while maintaining efficiency. It supports use via SentenceTransformers, Hugging Face Transformers, FlagEmbedding, and ONNX for various deployment scenarios. Typical usage includes normalizing output embeddings and calculating cosine similarity via dot product for ranking.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Qwen2.5-14B-Instruct

    Qwen2.5-14B-Instruct

    Powerful 14B LLM with strong instruction and long-text handling

    Qwen2.5-14B-Instruct is a powerful instruction-tuned language model developed by the Qwen team, based on the Qwen2.5 architecture. It features 14.7 billion parameters and is optimized for tasks like dialogue, long-form generation, and structured output. The model supports context lengths up to 128K tokens and can generate up to 8K tokens, making it suitable for long-context applications. It demonstrates improved performance in coding, mathematics, and multilingual understanding across over...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Devstral 2

    Devstral 2

    Agentic 123B coding model optimized for large-scale engineering

    Devstral 2 is a large-scale agentic language model purpose-built for software engineering tasks, excelling at codebase exploration, multi-file editing, and tool-driven automation. With 123B parameters and FP8 instruct tuning, it delivers strong instruction following for chat-based workflows, coding assistants, and autonomous developer agents. The model demonstrates outstanding performance on SWE-bench, validating its effectiveness in real-world engineering scenarios. It generalizes well...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    fashion-clip

    fashion-clip

    CLIP model fine-tuned for zero-shot fashion product classification

    FashionCLIP is a domain-adapted CLIP model fine-tuned specifically for the fashion industry, enabling zero-shot classification and retrieval of fashion products. Developed by Patrick John Chia and collaborators, it builds on the CLIP ViT-B/32 architecture and was trained on over 800K image-text pairs from the Farfetch dataset. The model learns to align product images and descriptive text using contrastive learning, enabling it to perform well across various fashion-related tasks without...
    Downloads: 0 This Week
    Last Update:
    See Project