Posts by Tags -

Tree Series 1: Decision Tree, Random Forest

Published: May 16, 2018

Tree is one of the most widely used model, with a large family(regression tree, classification tree, bagging: RF, boosting: GBDT), and implementations(classification: [ID3, C4.5, CART], regression: CART). This series of posts will start from a brief introduction of the basic principles of DT, RF and GBDT, then go into details of GBDT and other boosting techniques(e.g. lightgbm, xgboost, catboost), and dive deeper by making comparisons.

Tree Series 2: GBDT, Lightgbm, XGBoost, Catboost

Published: May 19, 2018

Introduction

Both bagging and boosting are designed to ensemble weak estimators into a stronger one, the difference is: bagging is ensembled by parallel order to decrease variance, boosting is to learn mistakes made in previous round, and try to correct them in new rounds, that means a sequential order. GBDT belongs to the boosting family, with a various of siblings, e.g. adaboost, lightgbm, xgboost, catboost. In this post, I will mainly explain the principles of GBDT, lightgbm, xgboost and catboost, make comparisons and elaborate how to do fine-tuning on these models.

Quick Notes: Summarization about Current NER Methods in Deep Learning

Published: August 26, 2018

Recently, I have been working on NER projects. As a greener, I have spent a lot of time in doing research of current NER methods, and made a summarization. In this post, I will list my summaries(NER in DL), hope this could be helpful for the readers who are interested and also new in this area. Before reading, I assume you already know some basic concepts(e.g. sequential neural network, POS，IOB tagging, word embedding, conditional random field).

Logistics Regression VS SVM

Published: May 28, 2016

Similarities

Clustering-DBSCAN

Published: March 11, 2017

Disadvantages of Partitioning and Hierarchical Methods

Before introducing density clustering algorithm, I will first talk about the shortcomings of other clustering methods. Partitioning algorithm, such as k-means, it requires to declare k in the first step of clustering. Moreover, it has restrictions on the shape of clusters (convex shape), meaning it requires gaussian shape clusters.

Quick Notes: Variation of GAN (Stacked GAN, StackGAN++, PA-GAN)

Published: January 15, 2019

For the past 4 months, I have been working on cardiovascular disease risk prediction. Through this, I come up with an idea to utilize GAN to learn in a progressive way and decide to write a paper on this topic(Sry, I can’t talk much about my idea in details). Then, I began doing background research and found three related topic. In this post, I will give summarizations of these topic.

Quick Notes: Useful Terms & Concepts in NLP: BOW, POS, Chunking, Word Embedding

Published: December 31, 2018

NLP algorithms are designed to learn from language, which is usually unstructured with arbitrary length. Even worse, different language families follow different rules. Applying different sentense segmentation methods may cause ambiguity. So it is necessary to transform these information into appropriate and computer-readable representation. To enable such transformation, multiple tokenization and embedding strategies have been invented. This post is mainly for giving a brief summary of these terms. (For readers, I assume you have already known some basic concepts, like tokenization, n-gram etc. I will mainly talk about word embedding methods in this blog)

Quick Notes: Summarization about Current NER Methods in Deep Learning

Published: August 26, 2018

Recently, I have been working on NER projects. As a greener, I have spent a lot of time in doing research of current NER methods, and made a summarization. In this post, I will list my summaries(NER in DL), hope this could be helpful for the readers who are interested and also new in this area. Before reading, I assume you already know some basic concepts(e.g. sequential neural network, POS，IOB tagging, word embedding, conditional random field).

DL Series1: Sequence Neural Network and Its Variants(RNN, LSTM, GRU)

Published: July 05, 2018

Introduction

Finally, I’m writing something about neural network. I will start with the branch I’m most familiar with, the sequential neural network. In this post, I won’t talk about the forward/ backword propogation, as there are plenty of excellent blogs and online courses. My motivation is to give clear comparison between RNN, LSTM and GRU. Because I find it’s very important to bare in mind the structural differences and the cause and effect between model structure and function of these models. These knowledge help me understand why certain type of model works well on certain kinds of problems. Then when working on real world problems(e.g. time series, name entity recognition), I become more confident in choosing the most appropriate model.

Very Confusing Pairs to Me🤔😥😴

Published: August 17, 2016

This post is used for elaborating details and make comparisons of some very similar and confusing pairs to me. And I will teach you how to draw ROC/AUC by hand(The most exciting part to me).

Quick Notes: Variation of GAN (Stacked GAN, StackGAN++, PA-GAN)

Published: January 15, 2019

For the past 4 months, I have been working on cardiovascular disease risk prediction. Through this, I come up with an idea to utilize GAN to learn in a progressive way and decide to write a paper on this topic(Sry, I can’t talk much about my idea in details). Then, I began doing background research and found three related topic. In this post, I will give summarizations of these topic.

DL Series1: Sequence Neural Network and Its Variants(RNN, LSTM, GRU)

Published: July 05, 2018

Introduction

Finally, I’m writing something about neural network. I will start with the branch I’m most familiar with, the sequential neural network. In this post, I won’t talk about the forward/ backword propogation, as there are plenty of excellent blogs and online courses. My motivation is to give clear comparison between RNN, LSTM and GRU. Because I find it’s very important to bare in mind the structural differences and the cause and effect between model structure and function of these models. These knowledge help me understand why certain type of model works well on certain kinds of problems. Then when working on real world problems(e.g. time series, name entity recognition), I become more confident in choosing the most appropriate model.

Useful Concept for Medical/Healthcare Data Risk Prediction

Published: July 14, 2018

This post is for explaining some basic statistical concepts used in disease morbidity risk prediction. As being a CS student, I have found a hard time figuring out these statistical concepts, hope my summary would be helpful for you.

Quick Notes: Summarization about Current NER Methods in Deep Learning

Published: August 26, 2018

Recently, I have been working on NER projects. As a greener, I have spent a lot of time in doing research of current NER methods, and made a summarization. In this post, I will list my summaries(NER in DL), hope this could be helpful for the readers who are interested and also new in this area. Before reading, I assume you already know some basic concepts(e.g. sequential neural network, POS，IOB tagging, word embedding, conditional random field).

DL Series1: Sequence Neural Network and Its Variants(RNN, LSTM, GRU)

Published: July 05, 2018

Introduction

Finally, I’m writing something about neural network. I will start with the branch I’m most familiar with, the sequential neural network. In this post, I won’t talk about the forward/ backword propogation, as there are plenty of excellent blogs and online courses. My motivation is to give clear comparison between RNN, LSTM and GRU. Because I find it’s very important to bare in mind the structural differences and the cause and effect between model structure and function of these models. These knowledge help me understand why certain type of model works well on certain kinds of problems. Then when working on real world problems(e.g. time series, name entity recognition), I become more confident in choosing the most appropriate model.

Logistics Regression VS SVM

Published: May 28, 2016

Similarities

Tree Series 2: GBDT, Lightgbm, XGBoost, Catboost

Published: May 19, 2018

Introduction

Both bagging and boosting are designed to ensemble weak estimators into a stronger one, the difference is: bagging is ensembled by parallel order to decrease variance, boosting is to learn mistakes made in previous round, and try to correct them in new rounds, that means a sequential order. GBDT belongs to the boosting family, with a various of siblings, e.g. adaboost, lightgbm, xgboost, catboost. In this post, I will mainly explain the principles of GBDT, lightgbm, xgboost and catboost, make comparisons and elaborate how to do fine-tuning on these models.

Tree Series 1: Decision Tree, Random Forest

Published: May 16, 2018

Tree is one of the most widely used model, with a large family(regression tree, classification tree, bagging: RF, boosting: GBDT), and implementations(classification: [ID3, C4.5, CART], regression: CART). This series of posts will start from a brief introduction of the basic principles of DT, RF and GBDT, then go into details of GBDT and other boosting techniques(e.g. lightgbm, xgboost, catboost), and dive deeper by making comparisons.

Quick notes of privileged information

Published: February 14, 2018

Why PI ?

In human learning, we have teachers, who master the knowledge well. Thus, being taught by the teachers, we students can quickly get to the point of that question, filter out false ones, learn very fast, and achieve good marks in the test by ourselves. That’s why PI is introduced, to help ML models learning in a more fast and accurate way on training set, and work well indepedently on testing set. So, in a normal classification paradigm, you will get a set of pair as: (x_i, y_i), after introducing PI, the pair will be (x_i, y_i^*, y_i)

Ensemble learning: Stacking

Published: January 03, 2018

Stacked generalization, stacking, is composed of two types of layers:

base classifiers (generally different learning algorithms)
meta learner (tries to combine the output of the first layer to obtain the final results)

Clustering-DBSCAN

Published: March 11, 2017

Disadvantages of Partitioning and Hierarchical Methods

Before introducing density clustering algorithm, I will first talk about the shortcomings of other clustering methods. Partitioning algorithm, such as k-means, it requires to declare k in the first step of clustering. Moreover, it has restrictions on the shape of clusters (convex shape), meaning it requires gaussian shape clusters.

Clustering-Kmeans

Published: March 05, 2017

Introduction of Clustering

Clustering is a process of grouping similar objects together. It belongs to unsupervised learning, as it’s unlabeled. There is a set of different clustering method, including partitioning method (flat clustering), hierarchical clustering and density-based method.

Very Confusing Pairs to Me🤔😥😴

Published: August 17, 2016

This post is used for elaborating details and make comparisons of some very similar and confusing pairs to me. And I will teach you how to draw ROC/AUC by hand(The most exciting part to me).

Logistics Regression VS SVM

Published: May 28, 2016

Similarities

SVM: Support Vector Machine

Published: May 21, 2016

The key of SVM is to find a hyperplane, which is built on some important instances (support vectors), to seperate data instances correctly. Here comes with a very contradictory process to construct the plane: the margin of the hyperplane is chosen to be the smallest distance between decision boundary and support vectors; at the same time the decision boundary need to be the one which the margin is maximized. This is because there can be many hyperplanes (Fig. 1) to seperate data correctly. Choosing the one which leads to the largest gap between both classes may be more resistant to any perturbation of the training data.

Logistic Regression

Published: May 15, 2016

What’s LR?

Logistic regression uses sigmoid function to estimate the probability of a sample belonging to a certain class, and obtains the unknown parameters by using maximum likelihood estimation. It assumes the data is linearly seperable as linear regression does. For example, a 2D dataset, it can be seperate by a linear decision boundary, which is wX+b=0. If a point makes wx+b>0, then it is more likely belongs to class 1, otherwise, class 0.

Quick Notes: Summarization about Current NER Methods in Deep Learning

Published: August 26, 2018

Recently, I have been working on NER projects. As a greener, I have spent a lot of time in doing research of current NER methods, and made a summarization. In this post, I will list my summaries(NER in DL), hope this could be helpful for the readers who are interested and also new in this area. Before reading, I assume you already know some basic concepts(e.g. sequential neural network, POS，IOB tagging, word embedding, conditional random field).

Quick Notes: Useful Terms & Concepts in NLP: BOW, POS, Chunking, Word Embedding

Published: December 31, 2018

NLP algorithms are designed to learn from language, which is usually unstructured with arbitrary length. Even worse, different language families follow different rules. Applying different sentense segmentation methods may cause ambiguity. So it is necessary to transform these information into appropriate and computer-readable representation. To enable such transformation, multiple tokenization and embedding strategies have been invented. This post is mainly for giving a brief summary of these terms. (For readers, I assume you have already known some basic concepts, like tokenization, n-gram etc. I will mainly talk about word embedding methods in this blog)

Quick Notes: Summarization about Current NER Methods in Deep Learning

Published: August 26, 2018

Recently, I have been working on NER projects. As a greener, I have spent a lot of time in doing research of current NER methods, and made a summarization. In this post, I will list my summaries(NER in DL), hope this could be helpful for the readers who are interested and also new in this area. Before reading, I assume you already know some basic concepts(e.g. sequential neural network, POS，IOB tagging, word embedding, conditional random field).

DL Series1: Sequence Neural Network and Its Variants(RNN, LSTM, GRU)

Published: July 05, 2018

Introduction

Finally, I’m writing something about neural network. I will start with the branch I’m most familiar with, the sequential neural network. In this post, I won’t talk about the forward/ backword propogation, as there are plenty of excellent blogs and online courses. My motivation is to give clear comparison between RNN, LSTM and GRU. Because I find it’s very important to bare in mind the structural differences and the cause and effect between model structure and function of these models. These knowledge help me understand why certain type of model works well on certain kinds of problems. Then when working on real world problems(e.g. time series, name entity recognition), I become more confident in choosing the most appropriate model.

Notes of Recommendation System

Published: October 12, 2015

When building a recommendation system, it mainly deals with problems, such as: How to collect data (known ratings) in the utility matrix; How to estimate unknown ratings from the known; Evaluating approach

Useful Concept for Medical/Healthcare Data Risk Prediction

Published: July 14, 2018

This post is for explaining some basic statistical concepts used in disease morbidity risk prediction. As being a CS student, I have found a hard time figuring out these statistical concepts, hope my summary would be helpful for you.

Logistics Regression VS SVM

Published: May 28, 2016

Similarities

Useful Concept for Medical/Healthcare Data Risk Prediction

Published: July 14, 2018

This post is for explaining some basic statistical concepts used in disease morbidity risk prediction. As being a CS student, I have found a hard time figuring out these statistical concepts, hope my summary would be helpful for you.

Tree Series 1: Decision Tree, Random Forest

Published: May 16, 2018

Tree is one of the most widely used model, with a large family(regression tree, classification tree, bagging: RF, boosting: GBDT), and implementations(classification: [ID3, C4.5, CART], regression: CART). This series of posts will start from a brief introduction of the basic principles of DT, RF and GBDT, then go into details of GBDT and other boosting techniques(e.g. lightgbm, xgboost, catboost), and dive deeper by making comparisons.

Clustering-DBSCAN

Published: March 11, 2017

Disadvantages of Partitioning and Hierarchical Methods

Before introducing density clustering algorithm, I will first talk about the shortcomings of other clustering methods. Partitioning algorithm, such as k-means, it requires to declare k in the first step of clustering. Moreover, it has restrictions on the shape of clusters (convex shape), meaning it requires gaussian shape clusters.

Clustering-Kmeans

Published: March 05, 2017

Introduction of Clustering

Clustering is a process of grouping similar objects together. It belongs to unsupervised learning, as it’s unlabeled. There is a set of different clustering method, including partitioning method (flat clustering), hierarchical clustering and density-based method.

Ensemble learning: Stacking

Published: January 03, 2018

Stacked generalization, stacking, is composed of two types of layers:

base classifiers (generally different learning algorithms)
meta learner (tries to combine the output of the first layer to obtain the final results)

Clustering-Kmeans

Published: March 05, 2017

Introduction of Clustering

Clustering is a process of grouping similar objects together. It belongs to unsupervised learning, as it’s unlabeled. There is a set of different clustering method, including partitioning method (flat clustering), hierarchical clustering and density-based method.

Quick notes of privileged information

Published: February 14, 2018

Why PI ?

In human learning, we have teachers, who master the knowledge well. Thus, being taught by the teachers, we students can quickly get to the point of that question, filter out false ones, learn very fast, and achieve good marks in the test by ourselves. That’s why PI is introduced, to help ML models learning in a more fast and accurate way on training set, and work well indepedently on testing set. So, in a normal classification paradigm, you will get a set of pair as: (x_i, y_i), after introducing PI, the pair will be (x_i, y_i^*, y_i)

Cellular-Automata-Rule30-Implementation

Published: September 28, 2015

Rules of cellular automata:

Introduction of Reinforcement Learning, Part 1

Published: November 27, 2017

Introduction

Recently, I have read some papers about Reinforcement Learning (RL). To me, it’s quite an interesting topic. But it’s also very complex, as it involves so many terms, definitions and methods. So I wrote this post to make an introductary overview of RL.

Ensemble learning: Stacking

Published: January 03, 2018

Stacked generalization, stacking, is composed of two types of layers:

base classifiers (generally different learning algorithms)
meta learner (tries to combine the output of the first layer to obtain the final results)

Tree Series 2: GBDT, Lightgbm, XGBoost, Catboost

Published: May 19, 2018

Introduction

Both bagging and boosting are designed to ensemble weak estimators into a stronger one, the difference is: bagging is ensembled by parallel order to decrease variance, boosting is to learn mistakes made in previous round, and try to correct them in new rounds, that means a sequential order. GBDT belongs to the boosting family, with a various of siblings, e.g. adaboost, lightgbm, xgboost, catboost. In this post, I will mainly explain the principles of GBDT, lightgbm, xgboost and catboost, make comparisons and elaborate how to do fine-tuning on these models.

Quick Notes: Useful Terms & Concepts in NLP: BOW, POS, Chunking, Word Embedding

Published: December 31, 2018

NLP algorithms are designed to learn from language, which is usually unstructured with arbitrary length. Even worse, different language families follow different rules. Applying different sentense segmentation methods may cause ambiguity. So it is necessary to transform these information into appropriate and computer-readable representation. To enable such transformation, multiple tokenization and embedding strategies have been invented. This post is mainly for giving a brief summary of these terms. (For readers, I assume you have already known some basic concepts, like tokenization, n-gram etc. I will mainly talk about word embedding methods in this blog)

Yanpu/ Lesley

Posts by Tags

Bagging

Boosting

Introduction

CRF

Classification

DBScan

Disadvantages of Partitioning and Hierarchical Methods

DL

Introduction

Draw ROC by Hand!!!

GAN

GRU

Introduction

Healthcare

LSTM

Introduction

Logistic regression

ML

Introduction

Why PI ?

Disadvantages of Partitioning and Hierarchical Methods

Introduction of Clustering

What’s LR?

NER

NLP

RNN

Introduction

Recommendation System

Risk Prediction

SVM

Statistics

Tree

clustering

Disadvantages of Partitioning and Hierarchical Methods

Introduction of Clustering

ensemble learning

kmeans

Introduction of Clustering

privileged information

Why PI ?

python

reinforcement learning

Introduction

stacking

tree

Introduction

word embedding