ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes

被引:2587
作者
Dai, Angela [1 ]
Chang, Angel X. [2 ]
Savva, Manolis [2 ]
Halber, Maciej [2 ]
Funkhouser, Thomas [2 ]
Niessner, Matthias [1 ,3 ]
机构
[1] Stanford Univ, Stanford, CA 94305 USA
[2] Princeton Univ, Princeton, NJ 08544 USA
[3] Tech Univ Munich, Munich, Germany
来源
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017) | 2017年
基金
美国国家科学基金会;
关键词
OBJECT DETECTION; DATABASE;
D O I
10.1109/CVPR.2017.261
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A key requirement for leveraging supervised deep learning methods is the availability of large, labeled datasets. Unfortunately, in the context of RGB-D scene understanding, very little data is available - current datasets cover a small range of scene views and have limited semantic annotations. To address this issue, we introduce ScanNet, an RGB-D video dataset containing 2.5M views in 1513 scenes annotated with 3D camera poses, surface reconstructions, and semantic segmentations. To collect this data, we designed an easy-to-use and scalable RGB-D capture system that includes automated surface reconstruction and crowd-sourced semantic annotation. We show that using this data helps achieve state-of-the-art performance on several 3D scene understanding tasks, including 3D object classification, semantic voxel labeling, and CAD model retrieval.
引用
收藏
页码:2432 / 2443
页数:12
相关论文
共 95 条
[1]  
Aldoma A, 2012, LECT NOTES COMPUT SC, V7574, P511, DOI 10.1007/978-3-642-33712-3_37
[2]  
[Anonymous], 2013, Consumer Depth Cameras for Computer Vision, DOI DOI 10.1007/978-1-4471-4640-7_8
[3]  
[Anonymous], 2013, P 23 INT JOINT C ART
[4]  
[Anonymous], 2016, OCCIPITAL STRUCTURE, V3
[5]  
[Anonymous], 2012, P SIGCHI C HUM FACT
[6]  
[Anonymous], ARXIV161005883
[7]  
[Anonymous], 2013, ACM Transactions on Graphics (ToG), DOI DOI 10.1145/2508363.2508374
[8]  
Armeni I., 2016, P IEEE INT C COMPUTE, V1, P2
[9]  
Armeni I., 2017, ARXIV170201105, V2
[10]  
Barbosa IB, 2012, LECT NOTES COMPUT SC, V7583, P433, DOI 10.1007/978-3-642-33863-2_43