Blog posts

2019

Quick Notes: Variation of GAN (Stacked GAN, StackGAN++, PA-GAN)

Published:

For the past 4 months, I have been working on cardiovascular disease risk prediction. Through this, I come up with an idea to utilize GAN to learn in a progressive way and decide to write a paper on this topic(Sry, I can’t talk much about my idea in details). Then, I began doing background research and found three related topic. In this post, I will give summarizations of these topic.

2018

Quick Notes: Useful Terms & Concepts in NLP: BOW, POS, Chunking, Word Embedding

Published:

NLP algorithms are designed to learn from language, which is usually unstructured with arbitrary length. Even worse, different language families follow different rules. Applying different sentense segmentation methods may cause ambiguity. So it is necessary to transform these information into appropriate and computer-readable representation. To enable such transformation, multiple tokenization and embedding strategies have been invented. This post is mainly for giving a brief summary of these terms. (For readers, I assume you have already known some basic concepts, like tokenization, n-gram etc. I will mainly talk about word embedding methods in this blog)

Quick Notes: Summarization about Current NER Methods in Deep Learning

Published:

Recently, I have been working on NER projects. As a greener, I have spent a lot of time in doing research of current NER methods, and made a summarization. In this post, I will list my summaries(NER in DL), hope this could be helpful for the readers who are interested and also new in this area. Before reading, I assume you already know some basic concepts(e.g. sequential neural network, POS,IOB tagging, word embedding, conditional random field).

Useful Concept for Medical/Healthcare Data Risk Prediction

Published:

This post is for explaining some basic statistical concepts used in disease morbidity risk prediction. As being a CS student, I have found a hard time figuring out these statistical concepts, hope my summary would be helpful for you.

DL Series1: Sequence Neural Network and Its Variants(RNN, LSTM, GRU)

Published:

Introduction

Finally, I’m writing something about neural network. I will start with the branch I’m most familiar with, the sequential neural network. In this post, I won’t talk about the forward/ backword propogation, as there are plenty of excellent blogs and online courses. My motivation is to give clear comparison between RNN, LSTM and GRU. Because I find it’s very important to bare in mind the structural differences and the cause and effect between model structure and function of these models. These knowledge help me understand why certain type of model works well on certain kinds of problems. Then when working on real world problems(e.g. time series, name entity recognition), I become more confident in choosing the most appropriate model.

Tree Series 2: GBDT, Lightgbm, XGBoost, Catboost

Published:

Introduction

Both bagging and boosting are designed to ensemble weak estimators into a stronger one, the difference is: bagging is ensembled by parallel order to decrease variance, boosting is to learn mistakes made in previous round, and try to correct them in new rounds, that means a sequential order. GBDT belongs to the boosting family, with a various of siblings, e.g. adaboost, lightgbm, xgboost, catboost. In this post, I will mainly explain the principles of GBDT, lightgbm, xgboost and catboost, make comparisons and elaborate how to do fine-tuning on these models.

Tree Series 1: Decision Tree, Random Forest

Published:

Tree is one of the most widely used model, with a large family(regression tree, classification tree, bagging: RF, boosting: GBDT), and implementations(classification: [ID3, C4.5, CART], regression: CART). This series of posts will start from a brief introduction of the basic principles of DT, RF and GBDT, then go into details of GBDT and other boosting techniques(e.g. lightgbm, xgboost, catboost), and dive deeper by making comparisons.

Quick notes of privileged information

Published:

Why PI ?

In human learning, we have teachers, who master the knowledge well. Thus, being taught by the teachers, we students can quickly get to the point of that question, filter out false ones, learn very fast, and achieve good marks in the test by ourselves. That’s why PI is introduced, to help ML models learning in a more fast and accurate way on training set, and work well indepedently on testing set. So, in a normal classification paradigm, you will get a set of pair as: (xi, yi), after introducing PI, the pair will be (xi, yi*, yi)

Ensemble learning: Stacking

Published:

Stacked generalization, stacking, is composed of two types of layers:

  1. base classifiers (generally different learning algorithms)
  2. meta learner (tries to combine the output of the first layer to obtain the final results)

2017

Introduction of Reinforcement Learning, Part 1

Published:

Introduction

Recently, I have read some papers about Reinforcement Learning (RL). To me, it’s quite an interesting topic. But it’s also very complex, as it involves so many terms, definitions and methods. So I wrote this post to make an introductary overview of RL.

Clustering-DBSCAN

Published:

Disadvantages of Partitioning and Hierarchical Methods

Before introducing density clustering algorithm, I will first talk about the shortcomings of other clustering methods. Partitioning algorithm, such as k-means, it requires to declare k in the first step of clustering. Moreover, it has restrictions on the shape of clusters (convex shape), meaning it requires gaussian shape clusters.

Clustering-Kmeans

Published:

Introduction of Clustering

Clustering is a process of grouping similar objects together. It belongs to unsupervised learning, as it’s unlabeled. There is a set of different clustering method, including partitioning method (flat clustering), hierarchical clustering and density-based method.

2016

Very Confusing Pairs to Me🤔😥😴

Published:

This post is used for elaborating details and make comparisons of some very similar and confusing pairs to me. And I will teach you how to draw ROC/AUC by hand(The most exciting part to me).

SVM: Support Vector Machine

Published:

The key of SVM is to find a hyperplane, which is built on some important instances (support vectors), to seperate data instances correctly. Here comes with a very contradictory process to construct the plane: the margin of the hyperplane is chosen to be the smallest distance between decision boundary and support vectors; at the same time the decision boundary need to be the one which the margin is maximized. This is because there can be many hyperplanes (Fig. 1) to seperate data correctly. Choosing the one which leads to the largest gap between both classes may be more resistant to any perturbation of the training data.

Logistic Regression

Published:

What’s LR?

Logistic regression uses sigmoid function to estimate the probability of a sample belonging to a certain class, and obtains the unknown parameters by using maximum likelihood estimation. It assumes the data is linearly seperable as linear regression does. For example, a 2D dataset, it can be seperate by a linear decision boundary, which is wX+b=0. If a point makes wx+b>0, then it is more likely belongs to class 1, otherwise, class 0.

2015

Notes of Recommendation System

Published:

When building a recommendation system, it mainly deals with problems, such as: How to collect data (known ratings) in the utility matrix; How to estimate unknown ratings from the known; Evaluating approach