SGAN aims to generate high-quality image. To me, the idea of SGAN is quite brilliant. Because instead of learning the whole data directly, it hierarchically extracts information from the data through encoders, generate representation level by level, then let G learns these representation in a top-down way. From my perspective, high-level image representation contains more details than low-level ones. So G can generate more realistic and specific low-level samples based on high-level representation. Also, by importing data level-by-level, I think this can control the speed of D’s learning speed, which is helpful for stablizing the training process. These are all personal opinions, plz correct me, if something is wrong. Thx :)
Figure 2. SGAN Framework
Next, I will dive into details of SGAN. The training stage can be divided into 3 sections, which is shown in Table 2. The first stage is to use encoders(CNN in this paper) to extract level-wise information from images in a bottom-up direction. The 2nd step is to independently training each Gi to predict level-wise representation , by taking a noise vector zi and higher level representation hi+1 as input, and let Di judge the quality of .
Attention: the higher level feature hi+1 is the outcome of encoder Ei. To ensure indeed masters the knowledge in hi+1, a conditional loss has been introduced(shown in Table 2).
Finally, here comes with the 3rd training stage: jointly training. All Gs have been connected in a top-to-bottom direction, forming an end-to-end architecture. Each G produce an intermediate representation and then pass it to the next G as an input. So here .
Besides, it is stated in the paper that simply adding the conditional loss indeed ensures Gi learns from hi+1, but Gi tends to ignore the random noise zi. So how to avoid this? Let’s first recall the reason for adding z.
It is used to increase diversity in results. So we want to be as diverse as possible when conditioned on hi+1, which is equivalent to maximize a conditional entropy to check if is sufficiently diverse when conditioned on hi+1. That’s why a entropy loss has been added. (More details are articulated in the original paper, section 3.4)
Table 2.
Training Step
Formula Details
Loss Function
Why Using This Loss Function
1. Encoders learn level-wise intermediate representations from bottom to top(Fig 3. left blue section, arrow direction)
2. Independent training of each G(Fig 3. central section)
Conditional loss:
Impose G to learn high-level conditional representation
Adversarial loss:
Discriminator identify real data hi and generated ones
3. Jointly end-to-end training of all Gs(Fig 3. central section, top to bottom, arrow direction)
entropy loss:
Maintain diverse generations
Testing Step
Maintain Results Diversity
Top-to-bottom generating simulated images, no info. needed from encoder(Fig 3. right section)
By adding random noise z to each level
Figure 3. SGAN Final Loss Function(Source: [1])
N: # stacks, hi: level-wise representation, x: input images, h0 = x, y: classification label, hN = y, Ei, Gi, Di: the ith encoder, generator, discriminator
Figure 4. SGAN Framework Details(Source: [1])
StackGAN++
TBA…
PA-GAN
TBA…
Reference
[1]. Xun H., Yixuan L. et al. Stacked Generative Adversarial Networks, CVPR(2017)
NLP algorithms are designed to learn from language, which is usually unstructured with arbitrary length. Even worse, different language families follow different rules. Applying different sentense segmentation methods may cause ambiguity. So it is necessary to transform these information into appropriate and computer-readable representation. To enable such transformation, multiple tokenization and embedding strategies have been invented. This post is mainly for giving a brief summary of these terms. (For readers, I assume you have already known some basic concepts, like tokenization, n-gram etc. I will mainly talk about word embedding methods in this blog)
Recently, I have been working on NER projects. As a greener, I have spent a lot of time in doing research of current NER methods, and made a summarization. In this post, I will list my summaries(NER in DL), hope this could be helpful for the readers who are interested and also new in this area. Before reading, I assume you already know some basic concepts(e.g. sequential neural network, POS,IOB tagging, word embedding, conditional random field).
This post is for explaining some basic statistical concepts used in disease morbidity risk prediction. As being a CS student, I have found a hard time figuring out these statistical concepts, hope my summary would be helpful for you.
Finally, I’m writing something about neural network. I will start with the branch I’m most familiar with, the sequential neural network. In this post, I won’t talk about the forward/ backword propogation, as there are plenty of excellent blogs and online courses. My motivation is to give clear comparison between RNN, LSTM and GRU. Because I find it’s very important to bare in mind the structural differences and the cause and effect between model structure and function of these models. These knowledge help me understand why certain type of model works well on certain kinds of problems. Then when working on real world problems(e.g. time series, name entity recognition), I become more confident in choosing the most appropriate model.
Leave a Comment