Job Description
Position Overview
We are seeking a research intern to explore fundamental challenges in geometry, design understanding, and relative spatial reasoning for vision-language models (VLMs). While modern VLMs have shown strong performance on captioning, semantic understanding, and segmentation, they continue to struggle with geometric reasoning, layout understanding, and precise relative positioning—capabilities that are critical for design, engineering, and creation workflows.
During this internship, you will work closely with research mentors to investigate new modeling and training paradigms that move beyond one-shot visual reasoning. The project will focus on approaches such as reinforcement learning, test-time computation, and “thinking with images,” where models iteratively attend to visual evidence, reason over intermediate representations, and verify hypotheses through visual feedback. The goal is to advance state-of-the-art methods for sp...