Zero-PIMA: Zero-shot Pill-Prescription Matching with Graph Convolutional Network and Contrastive Learning

1Graduate School of Informatics, Nagoya University, Japan 2School of Information and Communication Technology, HUST, Vietnam 3Guardian Robot Project, Information R&D and Strategy Headquarters, RIKEN, Japan 4Mathematical and Data Science Center, Nagoya University, Japan

Figure 1. The proposed Zero-PIMA method leverages the pill features obtained through object localization and employs a Graph Convolutional Network (GCN) to extract pill name features from prescriptions. The matching process is achieved through contrastive learning.

Abstract

Patients' safety is paramount in the healthcare industry, and reducing medication errors is essential for improvement. A promising solution to this problem involves the development of automated systems capable of assisting patients in verifying their pill intake mistakes. This paper investigates a Pill-Prescription matching task that seeks to associate pills in a multi-pill photo with their corresponding names in the prescription. We specifically aim to overcome the limitations of existing pill detection methods when faced with unseen pills, a situation characteristic of zero-shot learning. We propose a novel method named Zero-PIMA (Zero-shot Pill-Prescription Matching), designed to match pill images with prescription names effectively, even for pills not included in the training dataset. Zero-PIMA is an end-to-end model that includes an object localization module to determine and extract features of pill images and a graph convolutional network to capture the spatial relationship of the pills' text in the prescription. After that, we leverage the contrastive learning paradigm to increase the distance between mismatched pill images and pill name pairs while minimizing the distance between matched pairs. In addition, to deal with the zero-shot pill detection problem, we leverage pills' metadata retrieved from the DrugBank database to fine-tune a pre-trained text encoder, thereby incorporating visual information about pills (e.g., shape, color) into their names, making them more informative and ultimately enhancing the pill image-name matching accuracy. Extensive experiments are conducted on our collected real-world VAIPEPP dataset of multi-pill photos and prescriptions. Through a series of comprehensive experiments, the proposed method outperforms other methods for both seen and unseen pills in terms of mean average precision. These results indicate that the proposed method could reduce medication errors and improve patients' safety.

An End-to-end Pill-Prescription Matching Framework

Zero-PIMA architecture consists of three modules: Pill Detector, Prescription Recognizer, and Learning Objectives.
  • Pill Detector is responsible for localizing and extracting visual information from a multi-pill photo.
  • Pill Prescription Recognizer utilizes a Graph Convolutional Network to highlight the text boxes likely to be pill names and a pill-enhanced text embedding to learn representations of pill names.
  • Textual and visual data are fed into the Pill-Prescription alignment in the Learning Objective module to produce a text-image retrieval result.

Figure 2. Overview of Zero-PIMA. (a) Illustration of the Zero-PIMA architecture consists of three modules: Pill Detector, Prescription Recognizer, and Learning Objectives. (b) Semantic contrastive loss is applied to integrate pills’ metadata into the pill names’ embeddings.

VAIPEPP Dataset

VAIPEPP dataset was collected in real-world scenarios, where samples were taken in unconstrained environments. It consists of 2,156 multi-pill photos matching 1,527 prescriptions across 4 different templates. These were collected from anonymous patients at leading hospitals in Vietnam between 2021 and 2022.

Figure 3. Representative examples from our VAIPEPP dataset.

Visualization

Visualization of some predictions for unseen pill detection. Each column presents the prescription, the ground-truth pill images, and the predictions.

Figure 4: Illustration of some accurate predictions.

Figure 5: Illustration of some incorrect predictions.

Acknowledgment

The computation was carried out using the General Projects on the supercomputer "Flow" with the Information Technology Center, Nagoya University. This work was funded by Vingroup Joint Stock Company (Vingroup JSC), Vingroup, and supported by Vingroup Innovation Foundation (VINIF) under project code VINIF.2021.DA00128. This work was partly supported by JSPS KAKENHI JP21H0355.

BibTeX

@article{nguyen2024zeropima,
      author={Nguyen, Trung Thanh and Nguyen, Phi Le and Kawanishi, Yasutomo and Komamizu, Takahiro and Ide, Ichiro},
      journal={IEEE Access}, 
      title={Zero-shot Pill-Prescription Matching with Graph Convolutional Network and Contrastive Learning}, 
      year={2024},
      doi={10.1109/ACCESS.2024.3390153},
}