Elon Musk’s artificial intelligence (AI) company, xAI, has unveiled Grok 1.5 Vision, an upgrade to their existing Grok 1.5 model. This new iteration boasts the addition of computer vision capabilities, allowing it to not only process text but also understand and respond to visual information.

Grok 1.5 Vision can handle a wide range of visual data, including images, charts, diagrams, and even screenshots. This opens doors for new functionalities, such as generating code from flowcharts, calculating calorie counts based on pictures of food, or creating stories inspired by children’s drawings.

One of the key selling points of Grok 1.5 Vision is its supposed prowess in understanding the physical world. X (formerly known as Twitter) claims that the model outperforms competitors in their newly designed “RealWorldQA” benchmark, which measures an AI’s ability to grasp real-world spatial relationships. However, it’s important to note that xAI hasn’t shared benchmark results for all tasks, and Grok 1.5 Vision reportedly scored lower than OpenAI’s GPT-4 with Vision and Google’s Gemini 1.5 Pro on benchmarks like MMMU (Multimodal Multitasking Benchmark) and ChartQA.

This launch positions Grok 1.5 Vision as a direct competitor to other large language models with multimodal capabilities, like GPT-4 and Gemini 1.5 Pro. While xAI boasts of Grok’s edge in real-world understanding, the true champion will depend on how each model performs across a wider range of tasks and how accessible they become to users.

Shares: