多个数据集的集成使信息基因集的提取更加可靠。A、B伊库鲁斯工作流。Ikarus是细胞分类的两步程序。在第一步中,集成多个专家标记的数据集可以提取稳健的基因标记。然后将这些基因标记用于由逻辑回归和网络传播组成的复合分类器。C签名推导和模型选择的交叉验证精度比较。在验证集上选择最小平衡精度作为选择的度量(即,在测试集上表现更差)。仅在一个数据集上训练的模型获得的平衡精度低于在两个数据集上训练的模型(双面Wilcoxon检验给出的p值为0.063)。Lee等人的结直肠癌和Laughney等人的肺癌的组合达到了最高的最低平衡精度,为0.97。D激光显微解剖胃癌资料中基因特征评分的比较。 The normal gene list shows lower signature scores in cancer samples (p value 0.052, N = 8, Mood's median test), when compared to the cancer-associated normal tissue. The tumor gene signature is significantly higher for cancer samples than the normal tissue (p value 0.003, N = 8, Mood's median test). E Primary cells and cancer cell lines have significantly different gene signature distributions. The normal-cell gene signature shows a gradual reduction in gene signature score distribution when compared in primary cells, cell lines, and tumor cell lines. The gene signature shows the complete opposite effect. Cancer cell lines have the higher gene signature score distribution, followed by cell lines, and primary cells. Distributions were compared using pairwise Wilcoxon tests with BH-FDR correction. All adjusted p values were lower than 0.01. F Patient-derived xenografts (PDX) show significantly higher tumor gene signature score, than the normal gene signature score. The same pattern is observed in multiple cancer types. Normal and tumor signature distributions were compared using Wilcoxon tests, for each cancer type, followed by BH-FDR correction. All adjusted p values were lower than 0.01. Credit:基因组生物学(2022)。DOI: 10.1186 / s13059 022量02683量1
在识别海量数据中的模式方面,人类根本不是人工智能(AI)的对手。特别是人工智能的一个分支机器学习通常用于在数据集中寻找规律——无论是股票市场分析、图像和语音识别,还是单元格分类。为了可靠地区分癌症细胞从健康的细胞目前,由Helmholtz协会(MDC) Max Delbrück分子医学中心生物信息学和组学数据科学平台负责人Altuna Akalin博士领导的团队开发了一种名为“ikarus”的机器学习程序。