New Intraclass Helitrons Classification Using DNA-Image Sequences and Machine Learning Approaches - 22/05/21
pages | 11 |
Iconographies | 15 |
Vidéos | 0 |
Autres | 0 |
Abstract |
Helitrons, eukaryotic transposable elements (TEs) transposed by rolling-circle mechanism, have been found in various species with highly variable copy numbers and sometimes with a large portion of their genomes. The impact of helitrons sequences in the genome is to frequently capture host genes during their transposition. Since their discovery, 18 years ago, by computational analysis of whole genome sequences of Arabidopsis thaliana plant and Caenorhabditis elegans (C. elegans) nematode, the identification and classification of these mobile genetic elements remain a challenge due to the fact that the wide majority of their families are non-autonomous. In C. elegans genome, DNA helitrons sequences possess great variability in terms of length that varies between 11 and 8965 base pairs (bps) from one sequence to another. In this work, we develop a new method to predict helitrons DNA-sequences, which is particularly based on Frequency Chaos Game Representation (FCGR) DNA-images. Thus, we introduce an automatic system in order to classify helitrons families in C. elegans genome, based on a combination between machine learning approaches and features extracted from DNA-sequences. Consequently, the new set of helitrons features (the FCGR images and K-mers) are extracted from DNA sequences. These helitrons features consist of the frequency apparition number of K nucleotides pairs (Tandem Repeat) in the DNA sequences. Indeed, three different classifiers are used for the classification of all existing helitrons families. The results have shown potential global score equal to 72.7% due to FCGR images which constitute helitrons features and the pre-trained neural network as a classifier. The two other classifiers demonstrate that their efficiency reaches 68.7% for Support Vector Machine (SVM) and 91.45% for Random Forest (RF) algorithms using the K-mers features corresponding to the genomic sequences.
Le texte complet de cet article est disponible en PDF.Graphical abstract |
Highlights |
• | We convert DNA sequences into numerical ones based on chaos game representation theory: (FCGR). |
• | We characterize the helitron families based on their FCGR DNA-images and the K-mers methods. |
• | We develop three classification systems for helitron families: RF, SVM, and PTDNN. |
• | Evaluation is performed on ten helitron classes in C. elegans. |
• | Results show that these systems are very efficient in terms of helitron recognition. |
Keywords : Helitrons, Tandem repeat, Image recognition, SVM, Random Forest, Inception V3, CNN
Plan
Vol 42 - N° 3
P. 154-164 - juin 2021 Retour au numéroBienvenue sur EM-consulte, la référence des professionnels de santé.
L’accès au texte intégral de cet article nécessite un abonnement.
Bienvenue sur EM-consulte, la référence des professionnels de santé.
L’achat d’article à l’unité est indisponible à l’heure actuelle.
Déjà abonné à cette revue ?