Authors: Stojanović, Biljana T. 
Malkov, Saša N.
Biljanski, Miloš V.
Pavlović Lažetić, Gordana M.
Maljković Ružičić, Mirjana M.
Čukić, Ivan Lj.
Mitić, Nenad S.
Affiliations: Computer Science 
Mathematical Institute of the Serbian Academy of Sciences and Arts 
Title: A novel approach to SARS Cov-2 classification
First page: 60
Related Publication(s): Book of Abstracts
Conference: 5th Belgrade Bioinformatics Conference - BelBI2024. 17-20 June 2024 Belgrade, Serbia
Issue Date: 2024
Rank: M34
ISBN: 978-86-82679-16-5
Abstract: 
This paper presents an approach for clustering of particular SARS-CoV-2
protein types based on Codon Usage (CU) bias measures. Our previous
research has shown that clustering based on CU bias measures is very close
to the natural clustering by protein type, regardless of virus
affiliation. Relative Synonymous Codon Usage, RSCU, Effective Number of
Codons, ENC along with Effective Number of Codons for individual AAc,
ENCAA and Relative Codon Bias Strength, RCBS were calculated to measure
the CU bias in different proteins coding sequences. The dataset contains
928.850 SARS-CoV-2 complete virus isolates with non-ambiguous nucleotide
sequences. It contains 1.145.168 unique (out of a total of 15.564.504)
protein nucleotide sequences and the corresponding AAc sequences. Protein
coding sequences are associated with metadata, including the collection
date and the WHO virus strain annotation. Protein coding sequences within
the same type (for each of the 12 most abundant types) were clustered.
Different clustering algorithms (BIRCH, Kohonen Neural Network, fuzzy and
probabilistic clustering) were performed for clustering proteins based on
RSCU, ENC and RCBS with a variable number of clusters. WHO group
annotations were used for additional cluster description. Most clusters in
all results are homogeneous (with a maximum size of about 19-35% of the
input material) and are almost pure related to specific WHO group. Each
result contains one or two small cardinality heterogeneous clusters with
mixed WHO groups. These heterogeneous clusters likely denotes proteins
(isolates) that were present at the transition between the two WHO groups.
Combining results from different clustering algorithms the membership to
WHO groups of SARS-CoV-2 proteins can be described with very high accuracy
using protein clustering based on the results of CU bias measures.
Keywords: SARS-CoV-2 WHO groups | codon usage | clustering | data mining

Show full item record

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.