High-throughput phenotyping with electronic medical record data using a common semi-supervised approach (PheCAP)
Implement surrogate-assisted feature extraction (SAFE) and common machine learning approaches to train and validate phenotyping models. Background and details about the methods can be found at Zhang et al. (2019), Yu et al. (2016) and Liao et al. (2015).
Install development version from GitHub:
Install from SOURCE CODE
Follow the main steps, and try the R codes from the simulated data and real EHR data examples.
Yichi Zhang*
, Tianrun Cai*
, Sheng Yu*
, Kelly Cho, Chuan Hong, Jiehuan Sun, Jie Huang, Yuk-Lam Ho, Ashwin Ananthakrishnan, Zongqi Xia, Stanley Shaw, Vivian Gainer, Victor Castro, Nicholas Link, Jacqueline Honerlaw, Selena Huang, David Gagnon, Elizabeth Karlson, Robert Plenge, Peter Szolovits, Guergana Savova, Susanne Churchill, Christopher O’Donnell, Shawn Murphy, J Michael Gaziano, Isaac Kohane, Tianxi Cai*
, and Katherine Liao*
. Methods for High-throughput Phenotyping with Electronic Medical Record Data Using a Common Semi-supervised Approach (PheCAP). Nature Protocols (2019). *
contributed equally.
Yu, S., Chakrabortty, A., Liao, K. P., Cai, T., Ananthakrishnan, A. N., Gainer, V. S., … Cai, T. Surrogate-assisted feature extraction for high-throughput phenotyping. Journal of the American Medical Informatics Association (2017), e143-e149.
Liao, K. P., Cai, T., Savova, G. K., Murphy, S. N., Karlson, E. W., Ananthakrishnan, A. N., … Kohane, I. Development of phenotype algorithms using electronic medical records and incorporating natural language processing. BMJ (2015), 350(apr24 11), h1885–h1885.