MongoDB Atlas is the developer-friendly database used to build, scale, and run gen AI and LLM-powered apps—without needing a separate vector database. Atlas offers built-in vector search, global availability across 115+ regions, and flexible document modeling. Start building AI apps faster, all in one place.
Start Free
Test your software product anywhere in the world
Get feedback from real people across 190+ countries with the devices, environments, and payment instruments you need for your perfect test.
Global App Testing is a managed pool of freelancers used by Google, Meta, Microsoft, and other world-beating software companies.
Easily turn large sets of image urls to an image dataset
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
Also supports saving captions for url+caption datasets.
Opt-out directives:
Websites can pass the http headers X-Robots-Tag: noai, X-Robots-Tag: noindex , X-Robots-Tag: noimageai and X-Robots-Tag: noimageindex By default img2dataset will ignore images with such headers.
dude uncomplicated data extraction: A simple framework
Dude is a very simple framework for writing web scrapers using Python decorators. The design, inspired by Flask, was to easily build a web scraper in just a few lines of code. Dude has an easy-to-learn syntax. Dude is currently in Pre-Alpha. Please expect breaking changes. You can run your scraper from terminal/shell/command-line by supplying URLs, the output filename of your choice and the paths to your python scripts to dude scrape command.
A simple library for crawling the web.
This library will give you the ability to create macros
for crawling web site and preforming simple actions like preforming "log in" and other simple actions in web sites.
Trusted by 150 million+ creators and businesses globally
Unlock Picsart's full editing suite by embedding our Editor SDK directly into your platform. Offer your users the power of a full design suite without leaving your site.
Nomad is tiny but efficient search engine and web crawler. This works very good for searching with in the set of corporate websites on internet and/or intranet's HTML documents or knowledge repositories.