NVIDIA L4 GPUs. 5-second cold starts. Scale to zero when idle.
Deploy your model, get an endpoint, pay only for compute time. No GPU provisioning or infrastructure management required.
Try Free
Earn up to 16% annual interest with Nexo.
More flexibility. More control.
Generate interest, access liquidity without selling, and execute trades seamlessly. All in one platform.
Geographic restrictions, eligibility, and terms apply.
DeiT (Data-efficient Image Transformers) shows that Vision Transformers can be trained competitively on ImageNet-1k without external data by using strong training recipes and knowledge distillation. Its key idea is a specialized distillation strategy—including a learnable “distillation token”—that lets a transformer learn effectively from a CNN or transformer teacher on modest-scale datasets. The project provides compact ViT variants (Tiny/Small/Base) that achieve excellent...