Specialist Language Courses

AI standardized patients: a five-agent approach

Medical holography in high-tech lab

Standardized patients — trained actors who simulate clinical encounters — have long been considered the gold standard for teaching doctor-patient communication. But they are expensive, hard to scale, and inconsistent in quality from session to session. The rise of large language models has prompted a wave of interest in AI-powered alternatives, yet single-model virtual patients have struggled with a familiar problem: they tend to lose realism or contradict themselves once a conversation runs beyond a few exchanges.

A study published in the Journal of Medical Internet Research in June 2026 takes a markedly different approach. Researchers from Zhejiang University and the Chinese Academy of Medical Sciences built a multiagent system that divides the work of “being a patient” across five specialized AI components, each loosely modelled on a different function of the brain. One agent manages the patient’s personality and emotional state, another stores and retrieves clinical details from the case file, a third interprets the student’s questions, a fourth generates the patient’s actual responses, and a fifth checks those responses for consistency before they reach the student.

Gamification in Medical English

The team tested the system across 650 simulated consultations spanning five specialties, including internal medicine, surgery, paediatrics and dentistry, comparing it against a single-model baseline. The results were striking: every configuration of the multiagent system maintained clinical accuracy above 94%, and the best-performing version produced misleading information in just 1.28% of responses, compared with 4.72% for the single-model approach. Physicians evaluating the dialogues also rated the multiagent patients as noticeably more convincing and emotionally consistent, awarding near-perfect scores for educational usefulness in several configurations.

The system was not flawless. It struggled to push back convincingly against leading questions, and performance dipped when simulating paediatric cases, likely because child patients communicate quite differently from adults. The researchers also note that their evaluation relied on Chinese-language clinical data, so further testing would be needed before similar systems could be confidently deployed for English-language clinical communication training.

Even so, the findings add weight to a growing body of evidence that breaking a complex simulation task into specialized, cooperating AI components produces more reliable results than asking a single model to do everything at once — a principle with implications well beyond medical education.

Did you find this content valuable?

Subscribe to the Medical English Newsletter to receive monthly updates, insights, and resources on Medical English.

More To Explore