马闯等《Plant Physiology》 2024年
论文题目:PEA-m6A: an ensemble learning framework for accurately predicting N6-methyladenosine modifications in plants
论文作者:Minggui Song#, Jiawen Zhao#, Chujun Zhang, Chengchao Jia, Jing Yang, Haonan Zhao, Jingjing Zhai, Beilei Lei, Shiheng Tao, Siqi Chen, Ran Su, Chuang Ma*
论文摘要:N6-methyladenosine (m6A), which is the mostly prevalent modification in eukaryotic mRNAs, is involved in gene expression regulation and many RNA metabolism processes. Accurate prediction of m6A modification is important for understanding its molecular mechanisms in different biological contexts. However, most existing models have limited range of application and are species-centric. Here we present PEA-m6A, a unified, modularized and parameterized framework that can streamline m6A-Seq data analysis for predicting m6A-modified regions in plant genomes. The PEA-m6A framework builds ensemble learning based m6A prediction models with statistic-based and deep learning-driven features, achieving superior performance with an improvement of 6.7% to 23.3% in the area under precision-recall curve compared with state-of-the-art regional-scale m6A predictor WeakRM in 12 plant species. Especially, PEA-m6A is capable of leveraging knowledge from pretrained models via transfer learning, representing an innovation in that it can improve prediction accuracy of m6A modifications under small-sample training tasks. PEA-m6A also has a strong capability for generalization, making it suitable for application in within- and cross-species m6A prediction. Overall, this study presents a promising m6A prediction tool, PEA-m6A, with outstanding performance in terms of its accuracy, flexibility, transferability, and generalization ability. PEA-m6A has been packaged using Galaxy and Docker technologies for ease of use and is publicly available athttps://github.com/cma2015/PEA-m6A.
N6-甲基腺苷(m6A)是真核mRNA中最常见的修饰,参与基因表达调控和多种RNA代谢过程。准确预测m6A修饰对于理解其在不同生物学背景下的分子机制至关重要。然而,现有的大多数模型应用范围有限,且受物种限制。我们开发了一个统一的、模块化的和参数化的框架PEA-m6A,可以简化m6A-Seq数据分析,用于预测植物基因组中的m6A修饰区域。PEA-m6A框架基于统计学特征和深度学习的特征构建集成学习的m6A预测模型。相对于最新的m6A修饰区域预测方法 WeakRM,PEA-m6A的PRC指标(精度-召回曲线下的面积)在12种植物物种中提升了6.7%至23.3%。此外,PEA-m6A可以通过迁移学习利用预训练模型的知识,在小样本训练任务中提高了m6A修饰预测的准确性。PEA-m6A还具有很强的泛化能力,使其适用于在物种间和物种内进行m6A预测。综上所述,这项研究提供了一种具有高准确性、灵活性、可迁移性和泛化能力的m6A预测工具——PEA-m6A。PEA-m6A的源代码以及Galaxy和Docker image版本可通过以下网址获得:https://github.com/cma2015/PEA-m6A。
文章链接:https://doi.org/10.1093/plphys/kiae120