共 23 条
- [7] VL-BERT: Pre-training of Generic Visual-Linguistic Representations[J] . Weijie Su,Xizhou Zhu,Yue Cao,Bin Li,Lewei Lu,Furu Wei,Jifeng Dai.CoRR . 2019
- [8] ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks[J] . Jiasen Lu,Dhruv Batra,Devi Parikh,Stefan Lee.CoRR . 2019
- [10] Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations[J] . Ranjay Krishna,Yuke Zhu,Oliver Groth,Justin Johnson,Kenji Hata,Joshua Kravitz,Stephanie Chen,Yannis Kalantidis,Li-Jia Li,David A. Shamma,Michael S. Bernstein,Li Fei-Fei.International Journal of Computer Vision . 2017 (1)