Codon Usage-based SARS-CoV-2 protein classification

Veljković, Aleksandar; Stojanović, Biljana; Malkov, Saša; Beljanski, Miloš; Pavlović-Lažetić, Gordana; Mitić, Nenad

DC Field	Value	Language
dc.contributor.author	Veljković, Aleksandar	en_US
dc.contributor.author	Stojanović, Biljana	en_US
dc.contributor.author	Malkov, Saša	en_US
dc.contributor.author	Beljanski, Miloš	en_US
dc.contributor.author	Pavlović-Lažetić, Gordana	en_US
dc.contributor.author	Mitić, Nenad	en_US
dc.date.accessioned	2021-11-29T12:53:58Z	-
dc.date.available	2021-11-29T12:53:58Z	-
dc.date.issued	2021	-
dc.identifier.issn	2334-6590	-
dc.identifier.uri	http://researchrepository.mi.sanu.ac.rs/handle/123456789/4713	-
dc.description.abstract	Severe acute respiratory syndrome corona virus 2 (SARS-CoV-2) appeared in late 2019 and spread across the world causing pandemic in humans. Since viruses differ in their specificity toward host organisms, analysis of the viral genome organization contributes to better understanding of their evolution and ad- aptation in the host. Polymorphism in genomic composition is reflected in its codon and amino acid usage patterns, as well as in translation rate (where rare codons are assumed to be translated more slowly than common codons). The same holds for specific coding sequences and the corresponding types of proteins. The goal of the current research is to build a model for classification of proteins (or parts thereof ) based on codon usage patterns. As a dataset we used the NCBI dataset of all the SARS-CoV-2 isolates and their coding sequences, pre- processed as to eliminate those with missing values, ambiguous letters and full duplicates, ending up with around 66000 isolates and around 770000 coding sequences (ORFs). We performed cluster analysis of all the coding sequences from the dataset based on codon usage (CU) as a means of identification of number and “profile” of protein classes. The approach is sound since the external as well as internal measures of clustering quality are high. Results of clustering using TwoStep algorithm (using IBM SPSS Modeler tool) include 12 clusters containing almost perfectly separated types of proteins. This may be used as an argument that specific types of proteins have their specific codon usage patterns which may be then used for protein classification model. Except for classification model based on protein clustering, we experiment with clustering virus isolates by following dynamics of CU patterns as a function of time during pandemic.	en_US
dc.publisher	University of Novi Sad, Faculty of Sciences. Department of Biology and Ecology, Novi Sad	en_US
dc.relation.ispartof	Biologica Serbica	en_US
dc.subject	SARS-CoV-2 \| codon usage \| CU \| protein classification \| protein clustering	en_US
dc.title	Codon Usage-based SARS-CoV-2 protein classification	en_US
dc.type	Conference Paper	en_US
dc.relation.conference	Belgrade BioInformatics Conference 2021, Virtual Conference, June 21-25, 2021, Vinča, Serbia	en_US
dc.identifier.url	https://belbi.bg.ac.rs/wp-content/uploads/2021/06/Book_of_Abstracts_2021-1.pdf	-
dc.contributor.affiliation	Computer Science	en_US
dc.contributor.affiliation	Mathematical Institute of the Serbian Academy of Sciences and Arts	-
dc.relation.firstpage	92	-
dc.relation.issue	1 (Special Edition)	-
dc.relation.volume	43	-
dc.description.rank	M34	-
item.fulltext	No Fulltext	-
item.cerifentitytype	Publications	-
item.openairecristype	http://purl.org/coar/resource_type/c_18cf	-
item.openairetype	Conference Paper	-
item.grantfulltext	none	-
crisitem.author.orcid	0000-0003-2618-754X	-

Show simple item record

Page view(s)

114

checked on Jun 18, 2026

Google Scholar^TM

Check

Page view(s)

Google ScholarTM

Google Scholar^TM