Enable natural language prompting for the Segment Anything Model (SAM) using CLIP embeddings.
Use natural language prompting to interact with Meta's Segment Anything Model! While they trained and studied text prompting, they have chosen not to release it as part of their demo, so we at Brilliantly figured we would try our hand at it. This extension is not the same as the encoding they used, but rather is a layer built on top of it, built with OpenAI's CLIP and kevinzakka's implementation of Gradient-weighted Class Activation Mapping.
This is just for fun, and results are variable in their quality. Some of the failure modes are explainable (the gradient mapping is imprecise, so the hat in the screenshots is a near-miss) and others are way off.
Visit https://github.com/suvansh/say-anything-backend to learn how to set up the backend server locally in 2 minutes to use this extension.