CV
Research Interest
Machine Learning, Adversarial Machine Learning, Transfer Learning, Clinical Natural Language Processing
Education
- B.E. in Computer Science, Henan University, 2011
- M.E. in Computer Science, Jilin University, 2014
- M.S. in Data Science, WPI, 2016
Projects
- National Infectious Disease Outbreak Prediction, (Accepted by AMIA 2019 Informatics Summit) 09/2018
- Utilized multimodal fusion of disease morbidity incidences, weather history, social network trend and geo-spatial information, implemented VAE-LSTM model to predict infectious disease morbidity rate
- NER and RE on Legal Opening Brief, 11/2017
- Designed BiLSTM-CRF model to extract individual names, location and company, applied CRF model to classify prosecution and defense, with customized accuracy up to 93%
- Spatial Data Analysis: Exploration of MBTA Late Night Service with Crime Dataset, WPI, 05/2016
- Utilized public APIs to gather around 400,000+ crime data, 4000+ Restaurants data and 240,000+ ridership transaction records, combined three datasets to analyze the influence of MBTA Late Night Service on crime rate around Boston area
- Converted UTM zone coordinates, implemented clustering methods to find the distribution pattern; Applied spatial join on crime, MBTA and restaurant datasets, conducted various classification methods to predict the relationship between ridership frequency and crime rate
- National Data Mining Competition: Bank Customer Satisfaction Degree Prediction, Rang Tech, 06/2016
- Predicted if customers are satisfied with their banking experience
- Generated new features based on univariate and bivariate analysis, applied dimensionality reduction, generated a 2-layer 5-fold stacking model
Professional Experience
- Data Mining Engineer, Ping An Technology, China, present
- Developed multimodal frameworks by using machine learning and deep learning, focusing on chronic cardiovascular disease risk prediction with low quality and imbalanced data, infectious disease morbidity forecasting and electronic health record information extraction
- Data Engineer Intern, Pfizer, CT, 05/2016
- Enabled automatic process for Hadoop ingestion of data from multiple sources, created scripts to parse multi-formats data into json and parquet format simultaneously
- Connected MapR with Spotfire/ Tableau, used Apache Drill to create both json and parquet view to generate overlay visualizations
- Data Analyst Intern, Zakipoint Health Inc, MA, 12/2015
- Responsible for health plan potential cost driver analysis, designed the DB schema, manipulated data deployment and querying in mongoDB integrated with PyMongo, created visualizations by D3.js
Research Experience
- Main Contributor, Role of plant MicroRNA in cross-species regulatory networks of humans, 01/2012- 07/2014
Biological Identification and Information Security Laboratory- Aimed to explore the potential effects of ingeste plants microRNA on human digestive organs
- Designed the main workflow, preprocessed nearly 400,000 pieces of structured and unstructured gene targets, designed a feature extraction based on alignment types
- Generated a weighted regulatory network with 782 genes and 2444 interactions, applied PageRank to do node ranking, conducted module extraction based on hub node and functional enrichment analysis
Publications
Wenxiao Jia, Yi Wan, Yanpu Li et al. Integrating Multiple Data Sources and Learning Models to Predict Infectious Diseases in China, AMIA 2019 Summit (Accepted)
Hao Zhang, Yanpu Li et al. Role of plant MicroRNA in cross-species regulatory networks of humans, BMC Systems Biology (2016) 10:60
Hao Zhang et al. A computational method for predicting regulation of human microRNAs on the influenza virus genome. BMC Systems Biology. 7(Suppl 2):S3 doi:10.1186/1752-0509-7-S2-S3, 2013.
Honors and Awards
- National-level Data Mining Competition Top 20%, Rang Technologies and KVRA tech, 06/2016
- Graduate Qualifying Project Team Winner Award, WPI, 05/2016
- Data Science Travel Award 2015 for Grace Hopper Celebration of Women in Computing Conference Attendance, WPI, 06/2015
Volunteers
- Volunteer for Open Data Science Conference(ODSC) in Boston, held by ODSC, 05/2015
- Volunteer for environment protector around Worcester area, picking up trash, cleaning and painting, held by WPI, 10/2015
Hobbies
Photograph, Pool