Main Article Content
Abstract
The development of selective kinase inhibitors remains a key objective in cancer drug discovery, where predictive computational models can significantly accelerate the identification of leads. In this study, we investigate the fine-tuning strategies of the transformer-based ChemBERTa model for quantitative structure–activity relationship (QSAR) modeling of AXL receptor tyrosine kinase inhibitors, an important therapeutic target implicated in tumor progression and metastasis. A dataset of AXL inhibitors was curated from the ChEMBL database. Three fine-tuning configurations, namely baseline, full fine-tune, and aggressive, were implemented to examine the influence of learning rate, weight decay, and the number of frozen transformer layers on model performance. Models were evaluated using accuracy, precision, recall, F1-score, and calibration metrics. Results showed that both the full fine-tune and aggressive configurations outperformed the baseline model, achieving higher precision and F1-scores while maintaining robust recall. The aggressive configuration achieved the most balanced performance, with improved calibration and the lowest expected calibration error, indicating reliable probabilistic predictions. Overall, this study highlights that controlled fine-tuning of ChemBERTa significantly enhances predictive performance and confidence estimation in QSAR modeling, offering valuable insights for optimizing transformer-based chemical language models in kinase-targeted drug discovery.
Keywords
Article Details
Copyright (c) 2025 Teuku Rizky Noviandy, Ghazi Mauer Idroes, Mohsina Patwekar, Rinaldi Idroes

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
References
- Zhong L, Li Y, Xiong L, Wang W, Wu M, Yuan T, et al. Small molecules in targeted cancer therapy: advances, challenges, and future perspectives. Signal Transduction and Targeted Therapy 2021;6:201. https://doi.org/10.1038/s41392-021-00572-w.
- Liu G, Chen T, Zhang X, Ma X, Shi H. Small molecule inhibitors targeting the cancers. MedComm 2022;3. https://doi.org/10.1002/mco2.181.
- Min H-Y, Lee H-Y. Molecular targeted therapy for anticancer treatment. Experimental & Molecular Medicine 2022;54:1670–94. https://doi.org/10.1038/s12276-022-00864-3.
- Patwekar F, Patwekar M, Kamal MA. Synergizing phytonanotherapy and complementary medicine: Future horizons in cancer and diabetes care. Global Translational Medicine 2025;4:16. https://doi.org/10.36922/gtm.5840.
- Noviandy TR, Maulana A, Emran TB, Idroes GM, Idroes R. QSAR Classification of Beta-Secretase 1 Inhibitor Activity in Alzheimer’s Disease Using Ensemble Machine Learning Algorithms. Heca Journal of Applied Sciences 2023;1:1–7. https://doi.org/10.60084/hjas.v1i1.12.
- Noviandy TR, Maulana A, Idroes GM, Maulydia NB, Patwekar M, Suhendra R, et al. Integrating Genetic Algorithm and LightGBM for QSAR Modeling of Acetylcholinesterase Inhibitors in Alzheimer’s Disease Drug Discovery. Malacca Pharmaceutics 2023;1:48–54. https://doi.org/10.60084/mp.v1i2.60.
- Patwekar M, Patwekar F, Shaikh D, Fatema SR, Aher SJ, Sharma R. Receptor-based approaches and therapeutic targets in Alzheimer’s disease along with role of AI in drug designing: Unraveling pathologies and advancing treatment strategies. Applied Chemical Engineering 2023;6. https://doi.org/10.24294/ace.v6i3.2338.
- Ganorkar SB, Heyden Y Vander. Recent trends in pharmaceutical analysis to foster modern drug discovery by comparative in-silico profiling of drugs and related substances. TrAC Trends in Analytical Chemistry 2022;157:116747. https://doi.org/10.1016/j.trac.2022.116747.
- Kenneth C, Imani A, Pardamean B. Leveraging ChemBERTa Robustness in Drug-Drug Interaction Classification via Molecular Decomposition. 2024 7th Int. Semin. Res. Inf. Technol. Intell. Syst., IEEE; 2024, p. 694–9. https://doi.org/10.1109/ISRITI64779.2024.10963372.
- Vatsyayan S. Leveraging Machine Learning and ChemBERTa for Efficient Identification of CB1 Receptor-Active Compounds. 2025 IEEE Conf. Artif. Intell., IEEE; 2025, p. 600–5. https://doi.org/10.1109/CAI64502.2025.00273.
- Patil A, Singh N, Patwekar M, Patwekar F, Patil A, Gupta JK, et al. AI-driven insights into the microbiota: Figuring out the mysterious world of the gut. Intelligent Pharmacy 2025;3:46–52. https://doi.org/10.1016/j.ipha.2024.08.003.
- Sadeghi S, Bui A, Forooghi A, Lu J, Ngom A. Can large language models understand molecules? BMC Bioinformatics 2024;25:225. https://doi.org/10.1186/s12859-024-05847-x.
- Mswahili ME, Jeong Y-S. Transformer-based models for chemical SMILES representation: A comprehensive literature review. Heliyon 2024;10:e39038. https://doi.org/10.1016/j.heliyon.2024.e39038.
- Goyette M-A, Côté J-F. AXL Receptor Tyrosine Kinase as a Promising Therapeutic Target Directing Multiple Aspects of Cancer Progression and Metastasis. Cancers 2022;14:466. https://doi.org/10.3390/cancers14030466.
- Engelsen AST, Lotsberg ML, Abou Khouzam R, Thiery J-P, Lorens JB, Chouaib S, et al. Dissecting the Role of AXL in Cancer Immune Escape and Resistance to Immune Checkpoint Inhibition. Frontiers in Immunology 2022;13. https://doi.org/10.3389/fimmu.2022.869676.
- Noviandy TR, Idroes GM, Hardi I. Machine Learning Approach to Predict AXL Kinase Inhibitor Activity for Cancer Drug Discovery Using XGBoost and Bayesian Optimization. Journal of Soft Computing and Data Mining 2024;5:46–56.
- Wang R, Ji Y, Li Y, Lee S-T. Applications of Transformers in Computational Chemistry: Recent Progress and Prospects. The Journal of Physical Chemistry Letters 2025;16:421–34. https://doi.org/10.1021/acs.jpclett.4c03128.
- Passi N, Raj M, Shelke NA. A Review on Transformer Models: Applications, Taxonomies, Open Issues and Challenges. 2024 4th Asian Conf. Innov. Technol., IEEE; 2024, p. 1–6. https://doi.org/10.1109/ASIANCON62057.2024.10838047.
- Gaulton A, Bellis LJ, Bento AP, Chambers J, Davies M, Hersey A, et al. ChEMBL: A Large-Scale Bioactivity Database for Drug Discovery. Nucleic Acids Research 2012;40:D1100–7. https://doi.org/10.1093/nar/gkr777.
- Noviandy TR, Idroes GM, Mohd Fauzi F, Idroes R. Application of Ensemble Machine Learning Methods for QSAR Classification of Leukotriene A4 Hydrolase Inhibitors in Drug Discovery. Malacca Pharmaceutics 2024;2:68–78. https://doi.org/10.60084/mp.v2i2.217.
- Bento AP, Hersey A, Félix E, Landrum G, Gaulton A, Atkinson F, et al. An open source chemical structure curation pipeline using RDKit. Journal of Cheminformatics 2020;12:51. https://doi.org/10.1186/s13321-020-00456-1.
- Noviandy TR, Idroes R. Interpretable Machine Learning QSAR Models for Classification and Screening of VEGFR-2 Inhibitors in Anticancer Drug Discovery. Malacca Pharmaceutics 2025;3:58–66. https://doi.org/10.60084/mp.v3i2.339.
- Chithrananda S, Grand G, Ramsundar B. ChemBERTa: large-scale self-supervised pretraining for molecular property prediction. ArXiv Preprint ArXiv:201009885 2020.
- Goutam K, Balasubramanian S, Gera D, Sarma RR. LayerOut: Freezing Layers in Deep Neural Networks. SN Computer Science 2020;1:295. https://doi.org/10.1007/s42979-020-00312-x.
- Noviandy TR, Maulana A, Idroes GM, Suhendra R, Afidh RPF, Idroes R. An Explainable Multi-Model Stacked Classifier Approach for Predicting Hepatitis C Drug Candidates. Sci 2024;6:81. https://doi.org/10.3390/sci6040081.
- Noviandy TR, Maulana A, Irvanizam I, Idroes GM, Maulydia NB, Tallei TE, et al. Interpretable Machine Learning Approach to Predict Hepatitis C Virus NS5B Inhibitor Activity Using Voting-Based LightGBM and SHAP. Intelligent Systems with Applications 2025;25:200481. https://doi.org/10.1016/j.iswa.2025.200481.
- Tharwat A. Classification Assessment Methods. Applied Computing and Informatics 2021;17:168–92. https://doi.org/10.1016/j.aci.2018.08.003.
References
Zhong L, Li Y, Xiong L, Wang W, Wu M, Yuan T, et al. Small molecules in targeted cancer therapy: advances, challenges, and future perspectives. Signal Transduction and Targeted Therapy 2021;6:201. https://doi.org/10.1038/s41392-021-00572-w.
Liu G, Chen T, Zhang X, Ma X, Shi H. Small molecule inhibitors targeting the cancers. MedComm 2022;3. https://doi.org/10.1002/mco2.181.
Min H-Y, Lee H-Y. Molecular targeted therapy for anticancer treatment. Experimental & Molecular Medicine 2022;54:1670–94. https://doi.org/10.1038/s12276-022-00864-3.
Patwekar F, Patwekar M, Kamal MA. Synergizing phytonanotherapy and complementary medicine: Future horizons in cancer and diabetes care. Global Translational Medicine 2025;4:16. https://doi.org/10.36922/gtm.5840.
Noviandy TR, Maulana A, Emran TB, Idroes GM, Idroes R. QSAR Classification of Beta-Secretase 1 Inhibitor Activity in Alzheimer’s Disease Using Ensemble Machine Learning Algorithms. Heca Journal of Applied Sciences 2023;1:1–7. https://doi.org/10.60084/hjas.v1i1.12.
Noviandy TR, Maulana A, Idroes GM, Maulydia NB, Patwekar M, Suhendra R, et al. Integrating Genetic Algorithm and LightGBM for QSAR Modeling of Acetylcholinesterase Inhibitors in Alzheimer’s Disease Drug Discovery. Malacca Pharmaceutics 2023;1:48–54. https://doi.org/10.60084/mp.v1i2.60.
Patwekar M, Patwekar F, Shaikh D, Fatema SR, Aher SJ, Sharma R. Receptor-based approaches and therapeutic targets in Alzheimer’s disease along with role of AI in drug designing: Unraveling pathologies and advancing treatment strategies. Applied Chemical Engineering 2023;6. https://doi.org/10.24294/ace.v6i3.2338.
Ganorkar SB, Heyden Y Vander. Recent trends in pharmaceutical analysis to foster modern drug discovery by comparative in-silico profiling of drugs and related substances. TrAC Trends in Analytical Chemistry 2022;157:116747. https://doi.org/10.1016/j.trac.2022.116747.
Kenneth C, Imani A, Pardamean B. Leveraging ChemBERTa Robustness in Drug-Drug Interaction Classification via Molecular Decomposition. 2024 7th Int. Semin. Res. Inf. Technol. Intell. Syst., IEEE; 2024, p. 694–9. https://doi.org/10.1109/ISRITI64779.2024.10963372.
Vatsyayan S. Leveraging Machine Learning and ChemBERTa for Efficient Identification of CB1 Receptor-Active Compounds. 2025 IEEE Conf. Artif. Intell., IEEE; 2025, p. 600–5. https://doi.org/10.1109/CAI64502.2025.00273.
Patil A, Singh N, Patwekar M, Patwekar F, Patil A, Gupta JK, et al. AI-driven insights into the microbiota: Figuring out the mysterious world of the gut. Intelligent Pharmacy 2025;3:46–52. https://doi.org/10.1016/j.ipha.2024.08.003.
Sadeghi S, Bui A, Forooghi A, Lu J, Ngom A. Can large language models understand molecules? BMC Bioinformatics 2024;25:225. https://doi.org/10.1186/s12859-024-05847-x.
Mswahili ME, Jeong Y-S. Transformer-based models for chemical SMILES representation: A comprehensive literature review. Heliyon 2024;10:e39038. https://doi.org/10.1016/j.heliyon.2024.e39038.
Goyette M-A, Côté J-F. AXL Receptor Tyrosine Kinase as a Promising Therapeutic Target Directing Multiple Aspects of Cancer Progression and Metastasis. Cancers 2022;14:466. https://doi.org/10.3390/cancers14030466.
Engelsen AST, Lotsberg ML, Abou Khouzam R, Thiery J-P, Lorens JB, Chouaib S, et al. Dissecting the Role of AXL in Cancer Immune Escape and Resistance to Immune Checkpoint Inhibition. Frontiers in Immunology 2022;13. https://doi.org/10.3389/fimmu.2022.869676.
Noviandy TR, Idroes GM, Hardi I. Machine Learning Approach to Predict AXL Kinase Inhibitor Activity for Cancer Drug Discovery Using XGBoost and Bayesian Optimization. Journal of Soft Computing and Data Mining 2024;5:46–56.
Wang R, Ji Y, Li Y, Lee S-T. Applications of Transformers in Computational Chemistry: Recent Progress and Prospects. The Journal of Physical Chemistry Letters 2025;16:421–34. https://doi.org/10.1021/acs.jpclett.4c03128.
Passi N, Raj M, Shelke NA. A Review on Transformer Models: Applications, Taxonomies, Open Issues and Challenges. 2024 4th Asian Conf. Innov. Technol., IEEE; 2024, p. 1–6. https://doi.org/10.1109/ASIANCON62057.2024.10838047.
Gaulton A, Bellis LJ, Bento AP, Chambers J, Davies M, Hersey A, et al. ChEMBL: A Large-Scale Bioactivity Database for Drug Discovery. Nucleic Acids Research 2012;40:D1100–7. https://doi.org/10.1093/nar/gkr777.
Noviandy TR, Idroes GM, Mohd Fauzi F, Idroes R. Application of Ensemble Machine Learning Methods for QSAR Classification of Leukotriene A4 Hydrolase Inhibitors in Drug Discovery. Malacca Pharmaceutics 2024;2:68–78. https://doi.org/10.60084/mp.v2i2.217.
Bento AP, Hersey A, Félix E, Landrum G, Gaulton A, Atkinson F, et al. An open source chemical structure curation pipeline using RDKit. Journal of Cheminformatics 2020;12:51. https://doi.org/10.1186/s13321-020-00456-1.
Noviandy TR, Idroes R. Interpretable Machine Learning QSAR Models for Classification and Screening of VEGFR-2 Inhibitors in Anticancer Drug Discovery. Malacca Pharmaceutics 2025;3:58–66. https://doi.org/10.60084/mp.v3i2.339.
Chithrananda S, Grand G, Ramsundar B. ChemBERTa: large-scale self-supervised pretraining for molecular property prediction. ArXiv Preprint ArXiv:201009885 2020.
Goutam K, Balasubramanian S, Gera D, Sarma RR. LayerOut: Freezing Layers in Deep Neural Networks. SN Computer Science 2020;1:295. https://doi.org/10.1007/s42979-020-00312-x.
Noviandy TR, Maulana A, Idroes GM, Suhendra R, Afidh RPF, Idroes R. An Explainable Multi-Model Stacked Classifier Approach for Predicting Hepatitis C Drug Candidates. Sci 2024;6:81. https://doi.org/10.3390/sci6040081.
Noviandy TR, Maulana A, Irvanizam I, Idroes GM, Maulydia NB, Tallei TE, et al. Interpretable Machine Learning Approach to Predict Hepatitis C Virus NS5B Inhibitor Activity Using Voting-Based LightGBM and SHAP. Intelligent Systems with Applications 2025;25:200481. https://doi.org/10.1016/j.iswa.2025.200481.
Tharwat A. Classification Assessment Methods. Applied Computing and Informatics 2021;17:168–92. https://doi.org/10.1016/j.aci.2018.08.003.