Allowing vision-enabled AI systems such as smart glasses to capture and analyse video of medical appointments could save time and improve doctors' notes compared to audio-only AI scribes, an Australian study has found.
Audio-based AI scribes which record and process medical appointments to create detailed summaries are becoming more common in healthcare settings, but researchers are beginning to examine potential uses for multimodal AI which can also analyse images and video.
The Flinders University study, published in npj Digital Medicine in February, found a vision-enabled AI scribe “substantially improved documentation accuracy” for visual tasks, and reduced errors by making sure more important information was captured.
The study saw 10 clinical pharmacists record video of 100 simulated medical history interviews using Meta Ray-Ban smart glasses, before the vision was examined by an AI scribe developed using Google’s Gemini Pro 2.5 model.
Pharmacists standing in as interviewees were asked to state personal information such as names and dates of birth, as well as details of their mock medications, while medication packaging and labels were shown to the AI system.
The vision-enabled scribe achieved 98 per cent accuracy in capturing more than 2,000 points of data, while the audio-only version of the same system was 81 per cent accurate, researchers found.
The AI scribe with multimodal capabilities also significantly outperformed the audio-only instance in capturing information about medication dosing, with 97 per cent accuracy compared to 28 per cent for the audio-only version.
Capturing ‘more of the details that matter’
Study author and academic pharmacist Bradley Menz said vision-enabled AI scribes could be helpful for clinicians because “a lot of clinically important information is visual”.
“Important visual cues during consultations include patients’ medicine containers, prescriptions and devices, as well as their body language,” he said.
“When an AI system can use both what it hears and what sees in these consultations, it captures more of the details that matter for patient care.”
Fellow author and associate professor Ashley Hopkins said enabling AI scribes to analyse video would mean “less time editing AI-documentation and even more time focusing on patient care”.
“These findings suggest the next step may be that all scribe systems can interpret visual information as well as speech, which could open the door to wider clinical uses,” he said.

The Flinders University study found a vision-enabled AI scribe was more accurate than an audio-only one. Image: Flinders University study
Privacy and accuracy concerns remain
The adoption of vision-enabled AI scribes in healthcare would require “robust safeguards to protect patient privacy and data security, particularly given the sensitive nature of video recordings captured during clinical encounters”, the researchers wrote.
Using such technology to document a person’s visible health concerns, appearance, or behaviour could make them feel uncomfortable and may cause them to withhold sensitive information, they said.
“These concerns underscore the importance of engaging patients and stakeholders in implementation planning, particularly regarding informed consent processes,” they wrote.
The researchers suggested it was also worth exploring alternative approaches such as capturing only still images for AI analysis, instead of the continuous video recording of medical appointments.
Despite being 98 per cent accuracy in the study, the vision-enabled AI scribe still made 46 errors, which researchers said showed “the critical importance of clinician oversight in completing the medication history process”.
“This is an augmented tool, not a replacement for clinical judgement,” said Menz.
“The clinician still needs to review and sign off the document.”
“Robust human-in-the-loop processes” were also needed to mitigate emerging risks such as healthcare practitioners becoming complacent and over-relying on AI scribes, the researchers wrote.
Australia’s medical regulator the Therapeutic Goods Administration (TGA) announced in September 2025 that it was “stepping up its efforts” to regulate AI scribes in healthcare, following calls for greater oversight of the increasingly popular technology.