Weighted Fuzzy System for Identifying DNA N4-Methylcytosine Sites With Kernel Entropy Component Analysis
2024 (English)In: IEEE Transactions on Artificial Intelligence, E-ISSN 2691-4581, Vol. 5, no 2, p. 895-903Article in journal (Refereed) Published
Abstract [en]
N4-methylcytosine (4mC) is a common DNA methylation that has been implicated in epigenetic regulation and host defense. Accurate prediction of 4mC sites in DNA sequences will help to better explore the biological processes and mechanisms. For this problem, computational methods based on machine learning (ML) and deep learning (DL) are faster, less complex, and less expensive than experimental detection methods. However, the existing computational methods are still unsatisfactory in terms of prediction accuracy, so we propose a new method with better performance. In this work, we propose a weighted fuzzy system for identifying DNA 4mC sites by kernel entropy component analysis (KECA). We named it as W-TSK-FS-KECA. This model is improved based on the Takagi-Sugeuo-Kang fuzzy system (TSK-FS). We use position-specific trinucleotide propensity (PSTNP) to construct feature vectors on representative benchmark datasets. Then we use KECA to get the reconstruct error. Finally, we put the calculated reconstruction error add to the regular term of TSK-FS as the weights to enhance the model performance. Comparative experiments with other methods show that it has good classification performance. © 2023 IEEE.
Place, publisher, year, edition, pages
Piscataway, N.J.: IEEE, 2024. Vol. 5, no 2, p. 895-903
Keywords [en]
Bioinformatics, Bioinformatics, DNA, DNA N4-methylcytosine, Entropy, Fuzzy sets, fuzzy system, Fuzzy systems, Genomics, Kernel, kernel entropy component analysis, sequence classification
National Category
Cell and Molecular Biology
Identifiers
URN: urn:nbn:se:hh:diva-51968DOI: 10.1109/TAI.2023.3266191Scopus ID: 2-s2.0-85153368171OAI: oai:DiVA.org:hh-51968DiVA, id: diva2:1811503
2023-11-132023-11-132024-03-19Bibliographically approved