Research on multi-task learning methods for traditional chinese medicine visual question answering
-
Abstract
With the rapid advancement of artificial intelligence technologies, visual question answering (VQA) has gained increasing attention in the field of traditional Chinese medicine (TCM). VQA enables users to provide an image containing medicinal herbs, allowing the model to answer questions related to the herbs depicted in the image. This technology facilitates a more intuitive and accessible way for people to understand TCM, while also promoting the dissemination and popularization of TCM culture. However, the lack of specialized datasets and the limited adaptability of existing VQA models to TCM-specific tasks pose significant challenges. To address these issues, this study constructs a dedicated TCM VQA dataset and proposes a novel visual question answering method for traditional Chinese medicine based on multi-task learning (TCMML). The proposed model integrates Faster R-CNN and Chinese BERT to extract image and text features, respectively, and employs an end-to-end joint attention network based on self-attention and cross-attention mechanisms for feature fusion. Additionally, the model adopts a multi-task learning strategy, leveraging a shared task layer to achieve modality alignment and capture inter-task dependencies. Five task-specific expert modules are designed to address the five sub-tasks of TCM VQA, enabling the model to generate highly accurate answers. Experimental results demonstrate that the TCMML model achieves superior accuracy in TCM-related visual question answering tasks compared to existing state-of-the-art models, thereby validating the effectiveness of the multi-task learning strategy in this domain.
-
-