Emergence of Text Semantics in CLIP Image Encoders
Published in UniReps: 2nd Edition of the Workshop on Unifying Representations in Neural Models (NeurIPS workshop 2024), 2024
Humans process text visually; our work studies the semantics of text rendered in images. We show that the semantic information captured by image representations can decisively classify the sentiment of sentences and is robust against visual attributes like font and not based on simple character frequency associations.
Authors: Sreeram Vennam*, Shashwat Singh*, Anirudh Govil, Ponnurangam Kumaraguru
Download Paper