Innovation Center for Biomedical Informatics (ICBI), Department of Oncology and Department of Biostatistics, Bioinformatics and Biomathematics, Georgetown University Medical Center 

Cancer Prevention and Control Program, Lombardi Comprehensive Cancer Center

Brief BIO

I analyze "omics" data, including metabolomics and genomics, and consider their application in translational bioinformatics and precision medicine. In particular, I developed several novel computational and statistical methods for high-dimensional data analysis, led the first comprehensive metabolomic study for Duchenne muscular dystrophy, and contributed to several of the early exome sequencing projects of human tumors. Additional areas of interest include cancer epidemiology and population genetics.

I am an Assistant Professor at ICBI and the Departments of Oncology and Biostatistics, Bioinformatics & Biomathematics at Georgetown University Medical Center, as well as a member of the Cancer Prevention and Control Program at the Lombardi Comprehensive Cancer Center. I was a postdoctoral fellow in the Biostatistics Branch within the Division of Cancer Epidemiology and Genetics at the National Cancer Institute and hold a Ph.D. in Biostatistics and an M.H.S. in Bioinformatics from the Johns Hopkins Bloomberg School of Public Health and a B.S. in Mathematics from the University of Illinois at Urbana-Champaign. You can find out more about me in my CV here



I am interested in assessing evidence related to specific scientific hypotheses of interest across the existing literature, specifically in the area of precision medicine. The term "meta-analysis" refers to combining the results across studies in order to obtain a more precise quantitative summary for associations of interest, which can be done when multiple appropriate estimates exist from a set of similar studies.

  • Boca SM†, Pfeiffer RM, Sampson JN†. "Multivariate meta-analysis with an increasing number of parameters." Biometrical Journal, 2017, In Press [link at Biometrical Journal] † Corresponding authors
  • Rao S, Beckman R, Riazi S, Yabar C, Boca SM, Marshall JL, Brody J, Pishvaian MJ, Madhavan S. "Quantification and expert evaluation of evidence for chemopredictive biomarkers to personalize cancer treatment." Oncotarget, 2016, DOI: 10.18632/oncotarget.13544. [link at Oncotarget]


I develop methods and tools for high-dimensional data analysis, informed by my applied scientific projects in genomics and metabolomics. My emphasis has been in the areas of set-level inference and mediation analysis and my work has included the development of scientifically-relevant summary statistics and novel approaches for multiple testing.

  • Boca SM, Leek JT. "A regression framework for the proportion of true null hypotheses." [preprint at bioRxiv]
  • Boca SM, Sinha R, Cross AJ, Moore SC, Sampson JN. "Testing multiple biological mediators simultaneously." Bioinformatics, 2014, 30(2):214-220. [link at Bioinformatics]
  • Parmigiani G, Boca SM, Ding J, Trippa L. "Statistical tools and R software for cancer driver probabilities." In: Ochs, M, ed., Gene Function Analysis, Methods in Molecular Biology, 2014, 1101:113-34. [link at Springer]
  • Boca SM, Corrada Bravo H, Caffo B, Leek JT, Parmigiani G. "A decision-theory approach to interpretable set analysis for high-dimensional data." Biometrics, 2013, 69(3):614-623. [link to publicly available author manuscript at PMC] [link to final edited manuscript at Biometrics]
  • Boca SM, Kinzler K, Velculescu VE, Vogelstein B, Parmigiani G. "Patient-oriented gene set analysis for cancer mutation data." Genome Biology, 2010, 11:R112. [link at Genome Biology]


I am an active researcher in the field of metabolomics, which refers to the high-throughput analysis of metabolites (small molecules that play a role in metabolism). My work is especially applied to epidemiological and clinical research. Disease areas of interest include gastrointestinal cancers, breast cancer and Duchenne Muscular Dystrophy.

  • Boca SM†, Nishida M, Harris M, Rao S, Cheema AK, Gill K, Wang D, An L, Gauba R, Seol H, Morgenroth L, Henricson E, McDonald C, Mah JK, Clemens P, Hoffman EP, Hathout Y, Madhavan S. "Discovery of metabolic biomarkers for Duchenne Muscular Dystrophy within a natural history study." PLOS ONE, 2016, 11(4): e0153461. [link at PLOS ONE] † Corresponding author. Unprocessed data and code for processing available from Dryad Digital Repository. Code for data analysis available at
  • Guertin KA, Loftfield E, Boca SM, Sampson JN, Moore SC, Xiao Q, Huang WY, Xiong X, Freedman ND, Cross AJ, Sinha R. "Serum biomarkers of habitual coffee consumption may provide insight into the mechanism underlying the association between coffee consumption and colorectal cancer." American Journal of Clinical Nutrition, 2015, 101(5):1000-1011. [link at AJCN]
  • Cross AJ, Moore SC, Boca S, Huang WY, Xiong X, Stolzenberg-Solomon R, Sinha R, Sampson JN. "A prospective study of serum metabolites and colorectal cancer risk." Cancer, 2014, 120(19):3049-3057. [link at Cancer]
  • Cross AJ, Boca S, Freedman ND, Caporaso NE, Huang WY, Sinha R, Sampson JN, Moore SC. "Metabolites of tobacco smoking and colorectal cancer risk." Carcinogenesis, 2014, 35(7):1516-1522. [link to publicly available author manuscript at PMC] [link to final edited manuscript at Carcinogenesis]
  • Moore SC, Matthews CE, Sampson JN, Stolzenberg-Solomon RZ, Zheng W, Cai Q, Tan YT, Chow WH, Ji BT, Liu DK, Xiao Q, Boca SM, Leitzmann MF, Yang G, Xiang YB, Sinha R, Shu XO, Cross AJ. "Human metabolic correlates of body mass index." Metabolomics, 2014, 10:259-269. [link to publicly available author manuscript at PMC] [link to final edited manuscript at Metabolomics]
  • Sampson JN, Boca SM, Shu XO, Stolzenberg-Solomon RZ, Matthews CE, Hsing AW, Tan YT, Ji BT, Chow WH, Cai Q, Liu DK, Yang G, Xiang YB, Zheng W, Sinha R, Cross AJ, Moore SC. "Metabolomics in epidemiology: Sources of variability in metabolite measurements and implications." Cancer Epidemiology, Biomarkers & Prevention, 2013, 22(4):631-640. [link to publicly available author manuscript at PMC] [link to final edited manuscript at CEBP]


I contributed to several of the early exome sequencing projects of human tumors, thus improving scientific understanding of the genomic landscape of tumors and of driver and passenger genes and pathways in cancer. 

  • Parsons DW, Li M, Zhang X, Jones S, Leary RJ, Lin J, Boca SM, Carter H, Samayoa J, Bettegowda C, Gallia GL, Jallo GI, Binder ZA, Nikolsky Y, Hartigan J, Smith DR, Gerhard DS, Fults DW, VandenBerg S, Berger MS, Marie SKN, Shinjo SMO, Clara C, Phillips PC, Minturn JE, Biegel JA, Judkins AR, Resnick AC, Storm PB, Curran T, He Y, Rasheed BA, Friedman HS, Keir ST, McLendon R, Northcott PA, Taylor MD, Burger PC, Riggins GJ, Karchin R, Parmigiani G, Bigner DD, Yan H, Papadopoulos N, Vogelstein B, Kinzler KW, Velculescu VE. "The genetic landscape of the childhood cancer medulloblastoma." Science, 2011, 331(6016):435-439. [link to publicly available author manuscript at PMC] [link to final edited manuscript at Science]
  • Parmigiani G, Boca S, Lin J, Kinzler KW, Velculescu V, Vogelstein B. "Design and analysis issues in genome-wide somatic mutation studies of cancer." Genomics, 2009, 93(1):17-21. [link to publicly available author manuscript at PMC] [link to final edited manuscript at Genomics]
  • Leary RJ, Lin JC, Cummins J, Boca S, Wood LD, Parsons DW, Jones S, Sjoeblom T, Park BH, Parsons R, Willis J, Dawson D, Wilson JK, Nikolskaya T, Nikolsky Y, Kopelovich L, Papadopoulos N, Pennacchio LA,Wang TL, Markowitz SD, Parmigiani G, Kinzler KW, Vogelstein B, Velculescu VE. "Integrated analysis of homozygous deletions, focal amplifications, and sequence alterations in breast and colorectal cancers." Proceedings of the National Academy of Sciences, 2008, 105(42):16224-16229. [link at PNAS]
  • Wood LD, Parsons W, Jones S, Lin J, Sjoblom T, Leary RJ, Shen D, Boca SM, Barber T, Ptak J, Silliman N, Szabo S, Dezso Z, Ustyanksky V, Nikolskaya T, Nikolsky Y, Karchin R, Wilson PA, Kaminker JS, Zhang Z, Croshaw R, Willis J, Dawson D, Shipitsin M, Willson JKW, Sukumar S, Polyak C, Park BH, Pethiyagoda CL, Pant PVK, Ballinger DG, Sparks AB, Hartigan J, Smith DR, Suh E, Papadopoulos N, Buckhaults P, Markowitz SD, Parmigiani G, Kinzler KW , Velculescu VE, Vogelstein B. "The genomic landscapes of human breast and colorectal cancers." Science, 2007, 318(5853):1108-1113. [link at Science]


I am very interested in using biomathematical and biostatistical methods to better understand admixed populations, which arise from the amalgamation of multiple source populations. Recent admixture events include those that led to the creation of the African American and Hispanic and Mestizo populations. I also contributed to work which estimates cancer heritability based on genome-wide association studies.

  • Sampson J, WheelerWA, Yeager M, Panagiotou O,Wang Z, Berndt SI, Lan Q, Abnet CC, Amundadottir LT, Figueroa JD, Landi MT, Mirabello L, Savage SA, Taylor PR, De Vivo I, McGlynn KA, Purdue MP, Rajaraman P, . . . , Boca SM, Cerhan JR, Ferri GM, Hartge P, Hsiung CA, Magnani C, Miligi L, Morton LM, Smedby KE, Teras LR, Vijai J, Wang SS, Brennan P, Caporaso NE, Hunter DJ, Kraft P, Rothman N, Silverman DT, Slager SL, Chanock SJ, Chatterjee N. "Analysis of heritability and shared heritability based on genome-wide association studies for thirteen cancer types." Journal of the National Cancer Institute, 2015, 107(12): djv279. [link at JNCI]
  • Boca SM†, Rosenberg NA. "Mathematical properties of Fst between admixed populations and their parental source populations." Theoretical Population Biology, 2011, 80(3):208-216. [link to publicly available author manuscript at PMC] [link to final edited manuscript at TPB] † Corresponding author
  • Schroeder KB, Jakobsson M, Crawford MH, Schurr TG, Boca SM, Conrad DF, Tito RY, Osipova LP, Tarskaia LA, Zhadanov SI, Wall JD, Pritchard JK, Malhi RS, Smith DG, Rosenberg NA. "Haplotypic background of a private allele at high frequency in the Americas." Molecular Biology and Evolution, 2009, 26(5):995-1016. [link at MBE]
