Notice
This is not the latest version of this item. The latest version can be found at: https://repository.kaust.edu.sa/handle/10754/626720
Type
PreprintAuthors
Newell, AlejandroDeng, Jia
KAUST Grant Number
OSR-2015-CRG4-2639Date
2017-06-22Permanent link to this record
http://hdl.handle.net/10754/626720
Metadata
Show full item recordAbstract
Graphs are a useful abstraction of image content. Not only can graphs represent details about individual objects in a scene but they can capture the interactions between pairs of objects. We present a method for training a convolutional neural network such that it takes in an input image and produces a full graph. This is done end-to-end in a single stage with the use of associative embeddings. The network learns to simultaneously identify all of the elements that make up a graph and piece them together. We benchmark on the Visual Genome dataset, and report a Recall@50 of 9.7% compared to the prior state-of-the-art at 3.4%, a nearly threefold improvement on the challenging task of scene graph generation.Publisher
arXivarXiv
1706.07365Additional Links
http://arxiv.org/abs/1706.07365v1http://arxiv.org/pdf/1706.07365v1