CLIPSeg-RD64-Refined is a refined image segmentation model developed by CIDAS, based on the CLIP architecture. It enables zero-shot and one-shot segmentation by combining image and text prompts, allowing users to segment objects described in natural language. This refined version uses a reduced dimensionality of 64 (rd64) and a more complex convolutional refinement architecture to improve segmentation accuracy. The model was introduced in the paper Image Segmentation Using Text and Image Prompts by Lüddecke et al. and is released under the Apache-2.0 license. With a model size of 151 million parameters, it supports efficient deployment and is available in both I64 and F32 tensor types. CLIPSeg-RD64-Refined is designed for use with PyTorch and integrates well into workflows using Hugging Face Transformers. It can be applied across diverse domains such as medical imaging, robotics, and visual search, wherever precise, prompt-based segmentation is needed.
Features
- Zero-shot image segmentation using text prompts
- One-shot segmentation with image-based reference
- Refined architecture with dimensionality reduced to 64
- Enhanced with complex convolution for improved accuracy
- CLIP-based multimodal encoder for language-image understanding
- Lightweight model with only 151M parameters
- PyTorch support and Hugging Face compatibility
- Apache-2.0 license for permissive research and commercial use