Theoretical investigation of batch normalization
In last post, models with batch normalization have show advantages by a large margin. In this post, we will explore the following questions:
- How does batch normalization work?
- How is batch normalization related to linear or linear transformation?