DC FieldValueLanguage
dc.contributor.authorVeljković, Aleksandaren_US
dc.contributor.authorStojanović, Biljanaen_US
dc.contributor.authorMalkov, Sašaen_US
dc.contributor.authorBeljanski, Milošen_US
dc.contributor.authorPavlović-Lažetić, Gordanaen_US
dc.contributor.authorMitić, Nenaden_US
dc.description.abstractSevere acute respiratory syndrome corona virus 2 (SARS-CoV-2) appeared in late 2019 and spread across the world causing pandemic in humans. Since viruses differ in their specificity toward host organisms, analysis of the viral genome organization contributes to better understanding of their evolution and ad- aptation in the host. Polymorphism in genomic composition is reflected in its codon and amino acid usage patterns, as well as in translation rate (where rare codons are assumed to be translated more slowly than common codons). The same holds for specific coding sequences and the corresponding types of proteins. The goal of the current research is to build a model for classification of proteins (or parts thereof ) based on codon usage patterns. As a dataset we used the NCBI dataset of all the SARS-CoV-2 isolates and their coding sequences, pre- processed as to eliminate those with missing values, ambiguous letters and full duplicates, ending up with around 66000 isolates and around 770000 coding sequences (ORFs). We performed cluster analysis of all the coding sequences from the dataset based on codon usage (CU) as a means of identification of number and “profile” of protein classes. The approach is sound since the external as well as internal measures of clustering quality are high. Results of clustering using TwoStep algorithm (using IBM SPSS Modeler tool) include 12 clusters containing almost perfectly separated types of proteins. This may be used as an argument that specific types of proteins have their specific codon usage patterns which may be then used for protein classification model. Except for classification model based on protein clustering, we experiment with clustering virus isolates by following dynamics of CU patterns as a function of time during pandemic.en_US
dc.publisherUniversity of Novi Sad, Faculty of Sciences. Department of Biology and Ecology, Novi Saden_US
dc.relation.ispartofBiologica Serbicaen_US
dc.subjectSARS-CoV-2 | codon usage | CU | protein classification | protein clusteringen_US
dc.titleCodon Usage-based SARS-CoV-2 protein classificationen_US
dc.typeConference Paperen_US
dc.relation.conferenceBelgrade BioInformatics Conference 2021, Virtual Conference, June 21-25, 2021, Vinča, Serbiaen_US
dc.contributor.affiliationComputer Scienceen_US
dc.contributor.affiliationMathematical Institute of the Serbian Academy of Sciences and Arts-
dc.relation.issue1 (Special Edition)-
item.openairetypeConference Paper-
item.fulltextNo Fulltext-
Show simple item record

Page view(s)

checked on May 9, 2024

Google ScholarTM


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.