The nsfw_image_detection model by Falconsai is a fine-tuned Vision Transformer (ViT) designed to classify images as either "normal" or "nsfw" (not safe for work). Based on the vit-base-patch16-224-in21k architecture, it was initially pre-trained on the ImageNet-21k dataset and then fine-tuned using a curated proprietary dataset of 80,000 diverse images. The model achieved a strong evaluation accuracy of 98%, thanks to carefully tuned hyperparameters like a batch size of 16 and a learning rate of 5e-5. It is optimized for ethical content moderation and image safety filtering in digital platforms. The model can be used via the Hugging Face pipeline or loaded directly with PyTorch and Transformers for manual control. There is also an optional YOLOv9-based ONNX runtime script provided for inference in deployment scenarios. It is released under the Apache 2.0 license, allowing commercial use, with a strong emphasis on responsible implementation.
Features
- Fine-tuned ViT model for binary NSFW image classification
- Pre-trained on ImageNet-21k and fine-tuned on 80,000 proprietary images
- 98% evaluation accuracy and low eval loss (0.0746)
- Supports both Hugging Face pipeline and direct PyTorch use
- Optional ONNX + YOLOv9 workflow for scalable deployment
- Two-label classification: "normal" and "nsfw"
- Suitable for content moderation and adult content detection