Untitled Presentation - Hacettepe Üniversitesiaykut/classes/spring...Untitled Presentation Author...
Transcript of Untitled Presentation - Hacettepe Üniversitesiaykut/classes/spring...Untitled Presentation Author...
Slide Credits:Agrawal
Slide Credits:Agrawal
Slide Credits:Agrawal
Kolmogorov-Smirnov Test
p(Captions vs (Q+A))<0.001
LSTM : one hidden layer MLP : 2 hidden layer fc network output size 1024 1000 dropout(0.5) units tanheach word size 300 end-to-end learning cross-entropy
Deeper LSTM: two hidden layeroutput :
2048 > fc+tanh >1024
Input Vocabulary : All question words
2-Channel VQA Model
Convolution Layer+ Non-Linearity
Pooling Layer Convolution Layer+ Non-Linearity
Pooling Layer Fully-Connected MLP
4096-dim
Embedding
Embedding
“How many horses are in this image?”
Neural Network Softmax
over top K answers
Image
Question
1024-dim
Slide Credits:Agrawal
Ablation #1: Language-alone
Convolution Layer+ Non-Linearity
Pooling Layer Convolution Layer+ Non-Linearity
Pooling Layer Fully-Connected MLP
1k outputunits
EmbeddingNeural Network
Softmaxover top K answers
Image
“How many horses are in this image?”
Question Embedding
1024-dim
Slide Credits:Agrawal
Ablation #2: Vision-alone
Convolution Layer+ Non-Linearity
Pooling Layer Convolution Layer+ Non-Linearity
Pooling Layer Fully-Connected MLP
4096-dim
EmbeddingNeural Network
Softmaxover top K answers
Image
“How many horses are in this image?”
Question Embedding
Slide Credits:Agrawal
Slide Credits:Agrawal
Slide Credits:Agrawal
Current Leaderboard
Questions&Discussion&Demo