img2dataset
Easily turn large sets of image urls to an image dataset
...Can download, resize and package 100M urls in 20h on one machine.
Also supports saving captions for url+caption datasets.
Opt-out directives:
Websites can pass the http headers X-Robots-Tag: noai, X-Robots-Tag: noindex , X-Robots-Tag: noimageai and X-Robots-Tag: noimageindex By default img2dataset will ignore images with such headers.