High throughput in silico discovery of antimicrobial peptides in amphibian and insect transcriptomes

Published in University of British Columbia, 2020

Recommended citation: Lin, D. (2021). "High throughput in silico discovery of antimicrobial peptides in amphibian and insect transcriptomes." University of British Columbia. https://dx.doi.org/10.14288/1.0402476

Abstract: Antimicrobial peptides (AMPs) are a family of short defence proteins produced naturally by all multicellular organisms, varying from microorganisms to humans. Since resistance to AMPs is less frequent as to antibiotics, they may serve as a potential alternative. Past research has shown that amphibians have the richest known AMP diversity, specifically the North American bullfrog has demonstrated potential in aiding the discovery of novel putative AMPs. Antibiotic resistance is becoming more prevalent each day, requiring agricultural practices to reduce the use of antibiotics to protect human health, animal health, and food safety. To reduce the use of antibiotics, the goal of my thesis is to develop and execute an AMP discovery pipeline to discover AMPs suitable for pharmaceutical development. In this thesis, I have accomplished rAMPage: Rapid Antimicrobial Peptide Annotation and Gene Estimation. rAMPage is a scalable, high throughput bioinformatics-based discovery platform for mining AMP sequences in publicly available genomic resources. RNA-seq amphibian and insect reads from the Sequence Read Archive (SRA) are used. After trimming, reads are assembled with RNA-Bloom into transcripts, filtered, and translated in silico. Then, the translated protein sequences are compared to known AMP sequences from the NCBI protein database and specific AMP databases APD3 and DADP, via homology search. These sequences are cleaved into their mature/bioactive form. Next, machine learning algorithm AMPlify is employed to classify and prioritize the candidate AMPs based on their AMP probability score. Finally, these candidate AMPs are annotated and characterized. Across 84 datasets, rAMPage detected > 1,000 putative AMPs, where 90 sequences have been selected for downstream validation.