Relative Risk, RR: also called as risk ratio, rate ratio, ‘is the ratio of the probability of an outcome in an exposed group to the probability of an outcome in an unexposed group’(referenced from Wiki). In my understanding, it means the ratio of probability that a CHD patient has smoking experience vs without smoking experience. In this sense, it becomes easy to interpret RR.
RR = 1: the risk factor(smoking) has nothing to do with the probability that the person would suffer from CHD. After all, Pr(CHD|smoking) == Pr(CHD|non-smoking).
RR > 1: meaning smoking is positively related with CHD, it’s a risk factor, the larger the value is, the more important the factor will be.
RR < 1: the relationship between the two are negative, it can be viewed as a protect factor.
The formula is:
Odds: means given certain exposure condition(+/-), the ratio of probability of having disease vs not.
Odds Ratio, OR: is the ratio of odds vs odds, different from RR, which compares the intervention of an exposure to the disease group, OR views the comparison on both groups(disease, non-disease). Or to say the odds of having CHD with smoking habit is OR times the odds of having CHD without smoking. The OR value range can be interpreted in the same way as RR, while RR is more easily explanable.
OR is more widely used in studying the results of case/control group. RR is more suitable for prospective cohort study. When the base morbidity rate is significantly small(e.g. rare disease), especially , RR is approaching OR. Though odds and OR is not as easy interpretable as risks, it can be utilized to transfer linear function output into [0,1] value ranges for morbidity rate prediction(e.g. logistic regression, with 1 unit change in xj, the odds would change by a factor of exp(betaj), betaj is the coefficient of xj, detailed formula has been shown in my previous LR post).
Interpret Predictor Importance Using RR/OR with Confidence Interval The table[3] below shows the RR and CI of different predictors for CHD morbidity prediction. To evaluate the importance of a predictor, both RR/OR and CI should be considered. For example, ‘Age’ is a significant postive factor to CHD, as its RR is above 1, with 95% CI lower bound also above 1. ‘Blood Pressure High normal’ is not a significant one, because its CI lower bound is smaller than 1, indicating its OR has the chance to be below 1.
For the past 4 months, I have been working on cardiovascular disease risk prediction. Through this, I come up with an idea to utilize GAN to learn in a progressive way and decide to write a paper on this topic(Sry, I can’t talk much about my idea in details). Then, I began doing background research and found three related topic. In this post, I will give summarizations of these topic.
NLP algorithms are designed to learn from language, which is usually unstructured with arbitrary length. Even worse, different language families follow different rules. Applying different sentense segmentation methods may cause ambiguity. So it is necessary to transform these information into appropriate and computer-readable representation. To enable such transformation, multiple tokenization and embedding strategies have been invented. This post is mainly for giving a brief summary of these terms. (For readers, I assume you have already known some basic concepts, like tokenization, n-gram etc. I will mainly talk about word embedding methods in this blog)
Recently, I have been working on NER projects. As a greener, I have spent a lot of time in doing research of current NER methods, and made a summarization. In this post, I will list my summaries(NER in DL), hope this could be helpful for the readers who are interested and also new in this area. Before reading, I assume you already know some basic concepts(e.g. sequential neural network, POS,IOB tagging, word embedding, conditional random field).
Finally, I’m writing something about neural network. I will start with the branch I’m most familiar with, the sequential neural network. In this post, I won’t talk about the forward/ backword propogation, as there are plenty of excellent blogs and online courses. My motivation is to give clear comparison between RNN, LSTM and GRU. Because I find it’s very important to bare in mind the structural differences and the cause and effect between model structure and function of these models. These knowledge help me understand why certain type of model works well on certain kinds of problems. Then when working on real world problems(e.g. time series, name entity recognition), I become more confident in choosing the most appropriate model.
Leave a Comment