Skip to content

AndroidDevelopersTools/MASK

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MASK implementation

MASK leverages word embeddings as bridges to associate words with their corresponding prototypes, thereby enabling semantic knowledge alignment between the image and text modalities.

image image

Datasets and Metrics

We test the performance of MASK on two standard benchmark datasets: Flickr30k and MSCOCO. The image-text matching usually includes two sub-tasks in terms of: 1) image annotation: retrieving related texts given images, and 2) image retrieval: retrieving related images given texts. The commonly used evaluation criterions are R@1", R@5" and R@10", i.e., recall rates at the top-1, 5 and 10 results. Following existing works, we also use an additional criterion of Rs" by summing all the recall rates to evaluate the overall performance.

Implementation Details

In the multimodal aligned semantic knowledge, we collect all words from the VG dataset and filter out some special characters and rare words, resulting in a total of $K$=12,385 semantic concepts. For each image, we initially employ the pre-trained object detection model Bottom-UP Top-Down \footnote{https://github.com/MILVLG/bottom-up-attention.pytorch} to extract raw region representations, setting the number of detected regions to $I$=36 and the dimensionality of each region representation to $M$=2048. For each word, we obtain its word embedding using the pre-trained word vectors glove-twitter-50 \footnote{https://nlp.stanford.edu/projects/glove/}. The batch size is 4096 for the first 200 epochs and 2048 for the next 200 epochs. The trade-off factors $\lambda_1$ and $\lambda_2$ are set to 3. We use the Adam to optimize the loss with a learning rate of 1e-4.

Result

image

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages