The Objectron dataset is a collection of short, object-centric video clips, which are accompanied by AR session metadata that includes camera poses, sparse point-clouds and characterization of the planar surfaces in the surrounding environment. In each video, the camera moves around the object, capturing it from different angles. The data also contain manually annotated 3D bounding boxes for each object, which describe the object’s position, orientation, and dimensions. The dataset consists of 15K annotated video clips supplemented with over 4M annotated images in the following categories: bikes, books, bottles, cameras, cereal boxes, chairs, cups, laptops, and shoes. In addition, to ensure geo-diversity, our dataset is collected from 10 countries across five continents. Along with the dataset, we are also sharing a 3D object detection solution for four categories of objects — shoes, chairs, mugs, and cameras.
Features
- 15000 annotated videos and 4M annotated images
- All samples include high-res images, object pose, camera pose, point-cloud, and surface planes
- Ready to use examples in various tf.record formats, which can be used in Tensorflow/PyTorch
- Object-centric multi-views, observing the same object from different angles
- Accurate evaluation metrics, like 3D IoU for oriented 3D bounding boxes
- The data is stored in the objectron bucket on Google Cloud storage