约 13,200,000 个结果
在新选项卡中打开链接

GitHub - openai/CLIP: CLIP (Contrastive Language-Image …
CLIP (Contrastive Language-Image Pre-Training) is a neural network trained on a variety of (image, text) pairs. It can be instructed in natural language to predict the most relevant text snippet, given an image, without directly optimizing for the task, similarly to the zero-shot capabilities of GPT-2 and 3.
CLIP: Connecting text and images - OpenAI
2021年1月5日 · CLIP (Contrastive Language–Image Pre-training) builds on a large body of work on zero-shot transfer, natural language supervision, and multimodal learning.The idea of zero-data learning dates back over a decade 8 but until recently was mostly studied in computer vision as a way of generalizing to unseen object categories. 9, 10 A critical insight was to leverage natural language as a ...
- 某些结果已被删除