Visual Language Models for Automated Driving

ReasonDrive: Efficient Visual Question Answering for Autonomous Vehicles with Reasoning-Enhanced Small Vision-Language Models

Abstract: Vision-language models (VLMs) show promise for autonomous driving but often lack transparent reasoning capabilities that are critical for safety. We investigate whether explicitly modeling ...

GitHub

SparseOccVLA: Bridging Occupancy and Vision-Language Models via Sparse Queries for Unified 4D Scene Understanding and Planning

(*) Work done during the internship at Xiaomi EV and AIR. (†) Corresponding authors. In autonomous driving, Vision Language Models (VLMs) excel at high-level reasoning , whereas semantic occupancy ...

IEEE

A Survey on the Application of Large Language Models in Scenario-Based Testing of Automated Driving Systems

Abstract: The safety and reliability of Automated Driving Systems (ADSs) must be validated prior to large-scale deployment. Among existing validation approaches, scenario-based testing has been ...

Frontiers

Vision-language models for zero-shot weed detection and visual reasoning in UAV-based precision agriculture

1 Khalifa University Center for Autonomous Robotic Systems, Khalifa University, Abu Dhabi, United Arab Emirates 2 College of Information Technology, United Arab Emirates University, Al-Ain, Abu Dhabi, ...

GitHub

CoT4AD: A Vision-Language-Action Model with Chain-of-Thought Reasoning for Autonomous Driving

git clone https://github.com/wzh506/CoT4AD.git cd ./cot conda create -n cot python=3.8 -y conda activate cot pip install torch==2.4.1+cu118 torchvision==0.19.1+cu118 ...

CNN

Tesla scraps Model S and Model X to build robots

Tesla CEO Elon Musk, who turned an upstart electric vehicle maker into an industry-changing powerhouse, is pulling the plug on the two models that helped get him there, as he struggles with another ...

InfoWorld

Gemini Flash model gets visual reasoning capability

Agentic Vision combines visual reasoning with code execution to ground answers in visual evidence, delivering a 5% to 10% quality boost across most vision benchmarks, Google said. Google has added an ...

techxplore

Foundation AI models trained on physics, not words, are driving scientific discovery

While popular AI models such as ChatGPT are trained on language or photographs, new models created by researchers from the Polymathic AI collaboration are trained using real scientific datasets. The ...

InfoQ

MIT's Recursive Language Models Improve Performance on Long-Context Tasks

Unlock the full InfoQ experience by logging in! Stay updated with your favorite authors and topics, engage with content, and download exclusive resources. Cory Benfield discusses the evolution of ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results