Background The OpenAI CLIP is a neural network trained on a wide variety of images with a wide variety of natural language supervision that’s abundantly available on the internet. It is capable to map text and image embeddings into the same semantic space and make them comparable.
Numerous challenges faced worldwide can be effectively tackled through the applications of search, clustering, recommendation, or classification - all domains where embeddings excel. For instance, the task of locating research papers based on keywords becomes arduous when numerous synonymous terms exist. However, embeddings seamlessly simplify this process.
Background Deep learning for Information Retrieval is non-trivial. Most of the people who have a background in Information Retrieval in industry lack of proper deep learning knowledge. On the other hand, people research deep learning care a lot about classification, segmentation rather than search.
Instance-level image retrieval is an important component of image retrieval tasks. Given an image as Query, an instance retrieval system aims at find the same objects D with respect to Q. For instance, given an image of the Great Wall, instance retrieval should be able to find other Great Wall images, under different circumstances.