Null-free False Discovery Rate Control Using Decoy Permutations

Kun HE, Meng-jie LI, Yan FU, Fu-zhou GONG, Xiao-ming SUN

Acta Mathematicae Applicatae Sinica(English Series) ›› 2022, Vol. 38 ›› Issue (2) : 235-253.

PDF(877 KB)
PDF(877 KB)
Acta Mathematicae Applicatae Sinica(English Series) ›› 2022, Vol. 38 ›› Issue (2) : 235-253. DOI: 10.1007/s10255-022-1077-5
ARTICLES

Null-free False Discovery Rate Control Using Decoy Permutations

  • Kun HE1,3, Meng-jie LI2,3, Yan FU2,3, Fu-zhou GONG2,3, Xiao-ming SUN1,3
Author information +
History +

Abstract

The traditional approaches to false discovery rate (FDR) control in multiple hypothesis testing are usually based on the null distribution of a test statistic. However, all types of null distributions, including the theoretical, permutation-based and empirical ones, have some inherent drawbacks. For example, the theoretical null might fail because of improper assumptions on the sample distribution. Here, we propose a null distribution-free approach to FDR control for multiple hypothesis testing in the case-control study. This approach, named target-decoy procedure, simply builds on the ordering of tests by some statistic or score, the null distribution of which is not required to be known. Competitive decoy tests are constructed from permutations of original samples and are used to estimate the false target discoveries. We prove that this approach controls the FDR when the score function is symmetric and the scores are independent between different tests. Simulation demonstrates that it is more stable and powerful than two popular traditional approaches, even in the existence of dependency. Evaluation is also made on two real datasets, including an arabidopsis genomics dataset and a COVID-19 proteomics dataset.

Key words

multiple testing / false discovery rate / null distribution-free / p-value-free / decoy permutations / knockoff filter

Cite this article

Download Citations
Kun HE, Meng-jie LI, Yan FU, Fu-zhou GONG, Xiao-ming SUN. Null-free False Discovery Rate Control Using Decoy Permutations. Acta Mathematicae Applicatae Sinica(English Series), 2022, 38(2): 235-253 https://doi.org/10.1007/s10255-022-1077-5

References

[1] Almudevar, A., Klebanov, L.B., Qiu, X., Salzman, P., Yakovlev, A.Y. Utility of correlation measures in analysis of gene expression. NeuroRx, 3:384-395(2006)
[2] Barber, R.F., Candès, E. J. Controlling the false discovery rate via knockoffs. The Annals of Statistics, 43: 2055-2085(2015)
[3] Barber, R.F., Candès, E.J. A knockoff filter for high-dimensional selective inference. The Annals of Statistics, 47:2504-2537(2019)
[4] Barber, R.F., Cands, E.J., Samworth, R.J. Robust inference with knockoffs. The Annals of Statistics, 48: 1409-1431(2020)
[5] Basu, P., Cai, T.T., Das, K., Sun, W. Weighted false discovery rate control in large-scale multiple testing. Journal of the American Statistical Association, 113:1172-1183(2018)
[6] Benjamini, Y., Hochberg, Y. Controlling the false discovery rate:a practical and powerful approach to multiple testing. Journal of the Royal statistical society:series B (Methodological), 57:289-300(1995)
[7] Benjamini, Y., Krieger, A.M., Yekutieli, D. Adaptive linear step-up procedures that control the false discovery rate. Biometrika, 93:491-507(2006)
[8] Benjamini, Y., Yekutieli, D. The control of the false discovery rate in multiple testing under dependency. Annals of statistics, 29:1165-1188(2001)
[9] Candès, E., Fan, Y., Janson, L., Lv, J. Panning for gold:model-x knockoffs for high dimensional controlled variable selection. Journal of the Royal Statistical Society:Series B (Statistical Methodology), 80:551-577 (2018)
[10] Chow, Y.S., Teicher, H. Probability theory:independence, interchangeability, martingales. Springer Science&Business Media, 2012
[11] Couté, Y., Bruley, C., Burger, T. Beyond target-decoy competition:Stable validation of peptide and protein identifications in mass spectrometry-based discovery proteomics. Analytical Chemistry, 92:14898- 14906(2020)
[12] Danilova, Y., Voronkova, A., Sulimov, P., Kertsz-Farkas, A. Bias in false discovery rate estimation in mass-spectrometry-based peptide identification. Journal of Proteome Research, 18:2354-2358(2019)
[13] Diz, A.P., Carvajal-Rodríguez, A., Skibinski, D.O. Multiple hypothesis testing in proteomics:a strategy for experimental work. Molecular&Cellular Proteomics, 10:M110-004374(2011)
[14] Efron, B. Large-scale simultaneous hypothesis testing:the choice of a null hypothesis. Journal of the American Statistical Association, 99:96-104(2004)
[15] Efron, B. Size, power and false discovery rates. Annals of Statistics, 35:1351-1377(2007)
[16] Efron, B. Microarrays, empirical bayes and the two-groups model. Statistical Science, 23:1-22(2008)
[17] Efron, B. Large-scale inference:empirical Bayes methods for estimation, testing, and prediction. Cambridge University Press, 2012
[18] Efron, B., Tibshirani, R. Empirical bayes methods and false discovery rates for microarrays. Genetic epidemiology, 23:70-86(2002)
[19] Efron, B., Tibshirani, R., Storey, J.D., Tusher, V. Empirical bayes analysis of a microarray experiment. Journal of the American statistical association, 96:1151-1160(2001)
[20] Elias, J.E., Gygi, S.P. Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nature Methods, 4:207-214(2007)
[21] Emery, K. Controlling the FDR through multiple competition. Ph. D. thesis, The University of Sydney, 2020
[22] Emery, K., Hasam, S., Noble, W.S., Keich, U. Multiple competition-based fdr control and its application to peptide detection. International Conference on Research in Computational Molecular Biology, 54-71 (2020)
[23] Emery, K., Keich, U. Controlling the fdr in variable selection via multiple knockoffs. arXiv:1911.09442 (2019)
[24] Fan, Y., Demirkaya, E., Li, G., Lv, J. Rank:Large-scale inference with graphical nonlinear knockoffs. Journal of the American Statistical Association, 115:362-379(2020)
[25] Fan, Y., Lv, J., Sharifvaghefi, M., Uematsu, Y. Ipad:Stable interpretable forecasting with knockoffs inference. Journal of the American Statistical Association, 115:1822-1834(2020)
[26] Gimenez, J.R., Zou, J. Improving the stability of the knockoff procedure:Multiple simultaneous knockoffs and entropy maximization. Proceedings of Machine Learning Research, 89:2184-2192(2019)
[27] He, K. Multiple hypothesis testing methods for large-scale peptide identification in computational proteomics. Master's thesis, University of Chinese Academy of Sciences, 2013
[28] He, K., Fu, Y., Zeng, W., Luo, L., Chi, H., Liu, C., Qing, L., Sun, R., He, S. A theoretical foundation of the target-decoy search strategy for false discovery rate control in proteomics. arXiv:1501.00537(2015)
[29] He, K., Li, M., Fu, Y., Gong, F., Sun, X. A direct approach to false discovery rates by decoy permutations. arXiv:1804.08222(2018)
[30] Keich, U., Tamura, K., Noble, W.S. Averaging strategy to reduce variability in target-decoy estimates of false discovery rate. Journal of proteome research, 18:585-593(2019)
[31] Kerr, K.F. Comments on the analysis of unbalanced microarray data. Bioinformatics, 25:2035-2041 (2009)
[32] Langaas, M., Lindqvist, B.H., Ferkingstad, E. Estimating the proportion of true null hypotheses, with application to dna microarray data. Journal of the Royal Statistical Society:Series B (Statistical Methodology), 67:555-572(2005)
[33] Lee, C.-W., Efetova, M., Engelmann, J.C., Kramell, R., Wasternack, C., Ludwig-Müller, J., Hedrich, R., Deeken, R. Agrobacterium tumefaciens promotes tumor induction by modulating pathogen defense in arabidopsis thaliana. The Plant Cell, 21:2948-2962(2009)
[34] Lei, L., Fithian, W. Power of ordered hypothesis testing. International conference on machine learning, 48:2924-2932(2016)
[35] Levitsky, L.I., Ivanov, M.V., Lobas, A.A., Gorshkov, M.V. Unbiased false discovery rate estimation for shotgun proteomics based on the target-decoy approach. Journal of proteome research, 16:393-397(2017)
[36] Li, J., Maathuis, M.H. Ggm knockoff filter:False discovery rate control for gaussian graphical models. Journal of the Royal Statistical Society:Series B (Statistical Methodology), 83:534-558(2021)
[37] Liu, W., Ke, Y., Liu, J., Li, R. Model-free feature screening and fdr control with knockoff features. Journal of the American Statistical Association, to appear (2020)
[38] Liu, W., Shao, Q. Phase transition and regularized bootstrap in large-scale t-tests with false discovery rate control. The Annals of Statistics, 42:2003-2025(2014)
[39] Meinshausen, N., Rice, J. Estimating the proportion of false null hypotheses among a large number of independently tested hypotheses. The Annals of Statistics, 34:373-393(2006)
[40] Romano, Y., Sesia, M., Cands, E. Deep knockoffs. Journal of the American Statistical Association, 115: 1861-1872(2020)
[41] Sarkar, S.K. Some results on false discovery rate in stepwise multiple testing procedures. Annals of statistics, 30:239-257(2002)
[42] Scott, J.G., Berger, J.O. Bayes and empirical-bayes multiplicity adjustment in the variable-selection problem. The Annals of Statistics, 38:2587-2619(2010)
[43] Shen, B., Yi, X., Sun, Y., Bi, X., Guo, T. Proteomic and metabolomic characterization of covid-19 patient sera. Cell, 182:59-72(2020)
[44] Storey, J.D. A direct approach to false discovery rates. Journal of the Royal Statistical Society:Series B (Statistical Methodology), 64:479-498(2002)
[45] Storey, J.D. The positive false discovery rate:a bayesian interpretation and the q-value. The Annals of Statistics, 31:2013-2035(2003)
[46] Storey, J.D., Taylor, J.E., Siegmund, D. Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates:a unified approach. Journal of the Royal Statistical Society:Series B (Statistical Methodology), 66:187-205(2004)
[47] Storey, J.D., Tibshirani, R. Statistical significance for genomewide studies. Proceedings of the National Academy of Sciences, 100:9440-9445(2003)
[48] Strimmer, K. A unified approach to false discovery rate estimation. BMC bioinformatics, 9:1-14(2008)
[49] Tan, Y.-D., Xu, H. A general method for accurate estimation of false discovery rates in identification of differentially expressed genes. Bioinformatics, 30:2018-2025(2014)
[50] Tusher, V.G., Tibshirani, R., Chu, G. Significance analysis of microarrays applied to the ionizing radiation response. Proceedings of the National Academy of Sciences, 98:5116-5121(2001)
[51] Vergunst, A.C., van Lier, M.C., den Dulk-Ras, A., Hooykaas, P.J. Recognition of the agrobacterium tumefaciens vire2 translocation signal by the virb/d4 transport system does not require vire1. Plant physiology, 133:978-988(2003)
[52] Xie, Y., Pan, W., Khodursky, A.B. A note on using permutation-based false discovery rate estimates to compare different analysis methods for microarray data. Bioinformatics, 21:4280-4288(2005)
[53] Yu, C., Zelterman, D. A parametric model to estimate the proportion from true null using a distribution for p-values. Computational statistics&data analysis, 114:105-118(2017)

Funding

This paper is supported by the National Key R&D Program of China (No. 2018YFB0704304), the National Natural Science Foundation of China (Nos. 32070668, 62002231, 61832003, 61433014) and the K.C. Wong Education Foundation.
PDF(877 KB)

317

Accesses

0

Citation

1

Altmetric

Detail

Sections
Recommended

/