Log In

Forgot Password?
Create New Account

Loading... please wait

Abstract Details

Using Large Language Models to Identify Candidates for Pediatric Epilepsy Surgery
Child Neurology and Developmental Neurology
P3 - Poster Session 3 (5:30 PM-6:30 PM)
8-001
Only one-third of eligible children receive epilepsy surgery, with low rates in underserved populations. Surgical procedures include definitive procedures targeting seizure freedom (resection, ablation, disconnection) and palliative procedures aiming to reduce seizure frequency and/or severity. Large language models (LLMs) may be able to extract information from clinical notes and recommend surgical treatments.
Evaluate two large language models’ abilities to identify pediatric cases for epilepsy surgery, and recommend palliative or definitive procedures.
We conducted a retrospective observational cohort study. We compiled surgical eligibility criteria via literature review and rapid qualitative analysis with input from 7 pediatric epileptologists. Notes and vignettes of children with refractory epilepsy were manually classified into “surgical” and “not surgical”; then “surgical” into “definitive” or “palliative”. PaLM 2 Bison (Google, Mountain View, CA) and GPT-4 (OpenAI, San Francisco, CA) LLMs were prompted using zero- and few-shot approaches. Performance was evaluated through sensitivity, specificity, and positive (PPV) and negative (NPV) predictive values.
Literature review and interviews indicated that seizures refractory to 2 or more anti-seizure medications would make a child eligible for epilepsy surgery. Factors favoring definitive surgery included concordant imaging, neuropsychological testing, and semiology; preservation of eloquent areas; and certain genetic mutations. For 24 cases, LLMs identified surgical candidates with >90% sensitivity using all approaches. The few-shot approach had the best overall performance, with 88.9% specificity for both LLMs; PPV of 93.3% for Bison, 93.8% for GPT-4; and NPV of 88.9% for Bison and 100% for GPT-4. Neither LLM effectively distinguished definitive vs palliative. Bison often memorized prompts and presented fabricated data (“hallucinations”), reducing specificity.
Preliminary data suggest that LLMs can identify candidates for epilepsy surgery with >90% sensitivity. Performance for LLMs was highest with a few-shot approach. LLMs performed less well in distinguishing candidacy for definitive versus palliative procedures, and were impacted by hallucination and memorization.
Authors/Disclosures
Sarah Chowdhury (Weill Cornell Medical College)
PRESENTER
Miss Chowdhury has nothing to disclose.
Nuran Golbasi No disclosure on file
Alexander Zhao Mr. Zhao has nothing to disclose.
Carson Gundlach No disclosure on file
Ashwin Mahesh No disclosure on file
Natasha Basma No disclosure on file
Zachary Grinspan, MD Dr. Grinspan has received personal compensation for serving as an employee of Weill Cornell Medicine. Dr. Grinspan has received personal compensation in the range of $500-$4,999 for serving as a Consultant for Alpha Insights. Dr. Grinspan has received personal compensation in the range of $500-$4,999 for serving as a Consultant for Biopharma LTD (South Korea). Dr. Grinspan has received personal compensation in the range of $500-$4,999 for serving as a Consultant for Encoded Therapeutics. The institution of Dr. Grinspan has received research support from SLC6A1 Connect. The institution of Dr. Grinspan has received research support from STXBP1 Foundation. The institution of Dr. Grinspan has received research support from Clara Inspired. The institution of Dr. Grinspan has received research support from Horizon Therapeutics. The institution of Dr. Grinspan has received research support from NINDS. Dr. Grinspan has received intellectual property interests from a discovery or technology relating to health care.