DC FieldValueLanguage
dc.contributor.authorStojanović, Biljana T.en_US
dc.contributor.authorMalkov, Saša N.en_US
dc.contributor.authorBiljanski, Miloš V.en_US
dc.contributor.authorPavlović Lažetić, Gordana M.en_US
dc.contributor.authorMaljković Ružičić, Mirjana M.en_US
dc.contributor.authorČukić, Ivan Lj.en_US
dc.contributor.authorMitić, Nenad S.en_US
dc.date.accessioned2024-11-20T10:49:47Z-
dc.date.available2024-11-20T10:49:47Z-
dc.date.issued2024-
dc.identifier.isbn978-86-82679-16-5-
dc.identifier.urihttp://researchrepository.mi.sanu.ac.rs/handle/123456789/5389-
dc.description.abstractThis paper presents an approach for clustering of particular SARS-CoV-2 protein types based on Codon Usage (CU) bias measures. Our previous research has shown that clustering based on CU bias measures is very close to the natural clustering by protein type, regardless of virus affiliation. Relative Synonymous Codon Usage, RSCU, Effective Number of Codons, ENC along with Effective Number of Codons for individual AAc, ENCAA and Relative Codon Bias Strength, RCBS were calculated to measure the CU bias in different proteins coding sequences. The dataset contains 928.850 SARS-CoV-2 complete virus isolates with non-ambiguous nucleotide sequences. It contains 1.145.168 unique (out of a total of 15.564.504) protein nucleotide sequences and the corresponding AAc sequences. Protein coding sequences are associated with metadata, including the collection date and the WHO virus strain annotation. Protein coding sequences within the same type (for each of the 12 most abundant types) were clustered. Different clustering algorithms (BIRCH, Kohonen Neural Network, fuzzy and probabilistic clustering) were performed for clustering proteins based on RSCU, ENC and RCBS with a variable number of clusters. WHO group annotations were used for additional cluster description. Most clusters in all results are homogeneous (with a maximum size of about 19-35% of the input material) and are almost pure related to specific WHO group. Each result contains one or two small cardinality heterogeneous clusters with mixed WHO groups. These heterogeneous clusters likely denotes proteins (isolates) that were present at the transition between the two WHO groups. Combining results from different clustering algorithms the membership to WHO groups of SARS-CoV-2 proteins can be described with very high accuracy using protein clustering based on the results of CU bias measures.en_US
dc.subjectSARS-CoV-2 WHO groups | codon usage | clustering | data miningen_US
dc.titleA novel approach to SARS Cov-2 classificationen_US
dc.typeConference Paperen_US
dc.relation.conference5th Belgrade Bioinformatics Conference - BelBI2024. 17-20 June 2024 Belgrade, Serbiaen_US
dc.relation.publicationBook of Abstractsen_US
dc.contributor.affiliationComputer Scienceen_US
dc.contributor.affiliationMathematical Institute of the Serbian Academy of Sciences and Artsen_US
dc.relation.firstpage60-
dc.description.rankM34-
item.openairecristypehttp://purl.org/coar/resource_type/c_18cf-
item.openairetypeConference Paper-
item.cerifentitytypePublications-
item.fulltextNo Fulltext-
item.grantfulltextnone-
crisitem.author.orcid0000-0003-2618-754X-
Show simple item record

Page view(s)

3
checked on Nov 23, 2024

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.