Utilizing Language Models for the Attribution of Chinese Buddhist Translations: A Case Study of An Shigao’s Works
Lu Lu1,2, Zheng Yi2, Fang Yixin1
1.Center for Studies of History of Chinese Language, Zhejiang University, Hangzhou 310058, China 2.Research Institute for Ancient Books, Zhejiang University, Hangzhou 310058, China
Abstract:The study of identifying the authors or translators of ancient texts often requires inferring unknown texts based on the characteristics of known ones. In recent decades, the issues of translators and the dating of early Chinese Buddhist translations have attracted widespread attention in the academic community. This paper utilizes deep learning models, specifically BERT and RBT6, to extract feature information from translated texts and conduct a comprehensive examination of the inscription issues in An Shigao’s translations. Additionally, the study validates the effectiveness of language models in identifying texts in Chinese Buddhist translations.The study uses 14 widely accepted translations by An Shigao as positive samples and randomly selects non-An Shigao translations as negative samples. The experimental results show that the RBT6 model outperforms both the BERT and traditional support vector machine (SVM) models in precision, recalls, and other metrics, demonstrating superior classification performance. As a validation, the trained model is applied to evaluate 35 translations attributed to An Shigao but widely regarded as unreliable by scholars. The model’s evaluations are found to align perfectly with the conclusions established through textual criticism, thereby confirming its effectiveness in distinguishing authentic translations. Additionally, to examine whether factors such as variant texts, punctuation segmentation, and text length affect the detection results, the study employs techniques like masking, random punctuation insertion, and random segment extraction on the same set of texts. The results of both experiments are consistent, confirming that these factors had no significant effect on the model’s detection outcomes.This study applies the three trained models to detect the disputed or newly discovered translations attributed to An Shigao. The models identify the following texts as translations by An Shigao: T101 Za ahan jing 杂阿含经 (excluding sutras 9 and 10), T1557 Apitan wufaxing jing 阿毗昙五法行经, T735 Siyuan jing 四愿经 (17/537b17-c27 part), the Kongō-ji manuscript of Anban shouyi jing 安般守意经, Foshuo shi’ermen jing 佛说十二门经 and Fo shuojie shi’ermen jing 佛说解十二门经. In contrast, the models classify the following texts as non-An Shigao translations: T105 Wuyin piyu jing 五阴譬喻经, T109 Zhuan falun jing 转法轮经, Wushi jiaoji jing 五十校计经 (volume 59 and 60 of T397 Da fangdeng daji jing 大方等大集经), T605 Chanxing faxiang jing 禅行法想经, T792 Fa shouchen jing 法受尘经, the Dunhuang version of Sanshiqi pin jing 三十七品经, and sutras 9 and 10 of the T101 Za Ahan jing. The models’ verification results largely align with recent conclusions drawn from a linguistic perspective regarding the identification of suspicious translations attributed to An Shigao.This study offers a practical comparison between traditional identification methods and language model-based detection, reflecting on potential issues with both approaches. Traditional methods may involve selective interpretation of data, excessive reliance on documentary evidence, and an overemphasis on the uniqueness of linguistic features, while neglecting tendencies. In contrast, when using language models for identification, it is also crucial to consider the impact of content and format on detection results.This study applies deep learning language models to the identification of translated Buddhist texts, significantly enhancing the efficiency and scientific rigor of determining translator attributions. In the era of big data, language model-based detection methods not only provide effective support for author identification and dating of ancient texts but also significantly improve the processing efficiency of questionable documents, particularly in cases involving a vast corpus of texts with complex transmission histories. These methods offer scientific, rapid, and quantifiable analytical tools for related researches. This approach opens up exciting prospects for the advancement of philology and linguistics in the new era, while also providing valuable insights for the academic community to further explore the application of deep learning technology across various fields.
卢鹭, 郑伊, 方一新. 语言模型:汉译佛经考辨的新方法[J]. 浙江大学学报(人文社会科学版), 2025, 55(2): 82-101.
Lu Lu, Zheng Yi, Fang Yixin. Utilizing Language Models for the Attribution of Chinese Buddhist Translations: A Case Study of An Shigao’s Works. JOURNAL OF ZHEJIANG UNIVERSITY, 2025, 55(2): 82-101.
1 方一新、卢鹭:《近十余年从语言角度考辨可疑佛经成果的回顾与展望》,《浙江大学学报(人文社会科学版)》,2023年第2期,第5-28页。 2 Hou R. & Huang C. R., “Robust stylometric analysis and author attribution based on tones and rimes,” Natural Language Engineering, Vol. 26 (2020), pp. 49-71. 3 周爱、桑晨、张益嘉等:《诗人密码:唐诗作者身份识别》,《中文信息学报》2022年第6期,第162-170页。 4 Devlin J., Chang M. W. & Lee K. et. al., “BERT: pre-training of deep Bidirectional Transformers for language understanding,” http://arxiv.org/abs/1810.04805, 2024-03-06. 5 Cui Y., Che W. & Liu T. et. al, “Pre-training with whole word masking for Chinese BERT,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol. 29 (2021), pp. 3504-3514. 6 Liu Y., Ott M. & Goyal N. et al., “RoBERTa: a robustly optimized BERT pretraining approach,” http://arxiv.org/abs/1907.11692, 2024-03-06. 7 荷]许理和:《关于初期汉译佛经的新思考》,顾满林译,见四川大学汉语史研究所编:《汉语史研究集刊》第四辑,成都:巴蜀书社,2001年,第26-31页。 8 Nattier J., A Guide to the Earliest Chinese Buddhist Translations: Texts from the Eastern Han東漢 and Three Kingdoms三國 Periods, Tokyo: The International Research Institute for Advanced Buddhology, Soka University, 2008. 9 Zhao Y., “The wheel unturned: a study of the Zhuan falun jing (T109),” Journal of the International Association of Buddhist Studies, Vol. 43 (2020), pp. 275-346. 10 Zacchetti S., “Defining An Shigao’s 安世高 translation corpus: the state of the art in relevant research,” 见沈卫荣主编:《西域历史语言研究集刊》第三辑,北京:科学出版社,2010年,第249-270页。 11 落合俊典:「『大安般経』と『小安般経』」, Journal of Indian and Buddhist Studies, Vol. 51, No. 1 (2002), pp. 735-738. 12 Deleanu F., “The newly found text of the Anban shouyi jing translated by An Shigao,” Journal of the International College for Advanced Buddhist Studies, Vol. 6 (2003), pp. 63-100. 13 Zacchetti S., “A ‘new’ early Chinese Buddhist commentary: the nature of the Da anban shouyi jing 大安般守意經 T 602 reconsidered,” Journal of the International Association of Buddhist Studies, Vol. 31, No. 1/2 (2010), pp. 421-484. 14 卢鹭:《〈安般守意经〉传世本与古写经的关系补说》,见中国训诂学研究会《中国训诂学报》编:《中国训诂学报》第六辑,北京:商务印书馆,2022年,第258-276页。 15 Zacchetti S., “The rediscovery of three early Buddhist scriptures on meditation: a preliminary analysis of the Fo shuo shi’er men jing, the Fo shuo jie shi’er men jing translated by An Shigao and their commentary preserved in the newly found Kongo-ji manuscript,” Annual Report of the International Research Institute for Advanced Buddhology at Soka University, Vol. 6 (2002), pp. 251-299. 16 Harrison P., “Another addition to the An Shigao corpus? preliminary notes on an early Chinese Sa?yuktāgama translation,” 櫻部建博士喜寿記念論集刊行会編:『初期仏教からアビダルマへ:櫻部建博士喜寿記念論集』,京都:平楽寺書店,2002年。 17 Lin Y. M., A Study on the Anthology Za Ahan Jing (T101): Centered on its Linguistic Features, Translation Style, Authorship and School Affiliation, Saarbrücken: Lambert Academic Publishing, 2010. 18 林屋友次郎:『經録研究』(前篇),東京:岩波書店,1941年。 19 境野黃洋:『支那佛教精史』,東京:境野黃洋博士遺稿刊行會,1935年。 20 Greene E. M., “Doctrinal dispute in the earliest phase of Chinese Buddhism: anti-mahāyāna polemics in the Scripture on the Fifty Contemplations,” Journal of the International Association of Buddhist Studies, Vol. 40 (2017), pp. 63-109. 21 方广锠:《关于〈三十七品经〉的目录学考察》,见方广锠编:《藏外佛教文献》第二编总第十四辑,北京:中国人民大学出版社,2010年,第172-182页。 22 方一新、郭作飞:《东汉译经颜色词考略》,见四川大学汉语史研究所编:《汉语史研究集刊》第二十七辑,成都:四川大学出版社,2019年,第1-34页。 23 大正一切经刊行会编:《大正新修大藏经》,台北:新文丰出版公司,1996年。 24 汤用彤:《汤用彤全集》第五卷,石家庄:河北人民出版社,2000年。 25 荷]许理和:《佛教征服中国:佛教在中国中古早期的传播与适应》,南京:江苏人民出版社,2017年。 26 Zacchetti S.,“An early Chinese translation corresponding to Chapter 6 of the Pe?akopadesa: An Shigao’s Yin chi ru jing T603 and its Indian original: a preliminary survey,” Bulletin of the School of Oriental and African Studies, Vol. 65 (2002), pp. 74-98.