Mining the UniProtKB/Swiss-Prot database for antimicrobial peptides

Published in Protein Science, 2025

Recommended citation: Li, C., Sutherland, D., Salehi, A., Richter, A., Lin, D., Aninta, S.I., Ebrahimikondori, H., Yanai, A., Coombe, L., Warren, R.L., Kotkoff, M., Hoang, L.M.N., Helbing, C.C., & Birol, I. (2025). "Mining the UniProtKB/Swiss-Prot database for antimicrobial peptides." Protein Science 34(4):e70083. https://doi.org/10.1002/pro.70083

Abstract: The ever-growing global health threat of antibiotic resistance is compelling researchers to explore alternatives to conventional antibiotics. Antimicrobial peptides (AMPs) are emerging as a promising solution to fill this need. Naturally occurring AMPs are produced by all forms of life as part of the innate immune system. High-throughput bioinformatics tools have enabled fast and large-scale discovery of AMPs from genomic, transcriptomic, and proteomic resources of selected organisms. Public protein sequence databases, comprising over 200 million records and growing, serve as comprehensive compendia of sequences from a broad range of source organisms. Yet, large-scale in silico probing of those databases for novel AMP discovery using modern deep learning techniques has rarely been reported. In the present study, we propose an AMP mining workflow to predict novel AMPs from the UniProtKB/Swiss-Prot database using the AMP prediction tool, AMPlify, as its discovery engine. Using this workflow, we identified 8008 novel putative AMPs from all eukaryotic sequences in the database. Focusing on the practical use of AMPs as suitable antimicrobial agents with applications in the poultry industry, we prioritized 40 of those AMPs based on their similarities to known chicken AMPs in predicted structures. In our tests, 13 out of the 38 successfully synthesized peptides showed antimicrobial activity against Escherichia coli and/or Staphylococcus aureus. AMPlify and the companion scripts supporting the AMP mining workflow presented herein are publicly available on GitHub.