摘要管道的预测。RNA-MuTect-WMN管道的概述:在训练集(n = 100,绿色箭头),RNA-MuTect是应用于肿瘤RNA和DNA matched-normal获得变异贴上体细胞和生殖系的列表。然后随机森林分类器训练与收集的特性集5倍交叉验证的方式为每一个变体。在测试集(橙色箭头),3执行步骤:(1)MuTect应用与肿瘤RNA和没有matched-normal样本,产生体细胞和生殖系混合变量的列表。(2)五个训练模型被应用于这组变异以及其分类为体细胞或生殖系多数投票的方式。(3)最后,预测的变异是由RNA-MuTect进一步过滤的过滤步骤。b分布的精度和召回值验证(左)和测试(右)计算每个样本集。箱形图显示值,第25和第75百分位数。须向最极端的数据点不是离群值,和离群值表示为点。c精度的功能真正的体细胞突变每样的数量。 d Correlation between the number of predicted somatic mutations and the number of somatic mutations as determined by DNA with a matched-normal DNA sample. e Correlation between the number of predicted somatic mutations and the number of somatic mutations as determined by RNA with a matched-normal DNA sample. f Distribution of precision and recall values on validation (left) and test (right) sets computed for each sample in the lung dataset. Box plots show median, 25th, and 75th percentiles. The whiskers extend to the most extreme data points not considered outliers, and the outliers are represented as dots. g Distribution of precision and recall values on validation (left) and test (right) sets computed for each sample in the colon dataset. Box plots show median, 25th, and 75th percentiles. The whiskers extend to the most extreme data points not considered outliers, and the outliers are represented as dots. Source data are provided as a Source Data file. Credit:自然通讯(2022)。DOI: 10.1038 / s41467 - 022 - 30753 - 2