Recently, I become a fan of Weili Zhang, the new UFC Women’s Strawweight Champion. She is a truly dedicated, confident and humble Chinese mixed martial artist. While looking into MMA, I am well awa...
Wasserstein Barycenter
Wasserstein Barycenter problem The Wasserstein Barycenter problem focuses on solving a weighted mean of a collection probability distributions such that the weighted Wasserstein distance is minimi...
Sinkhorn Algorithm
The Wasserstein distance The Wasserstein distance measures the discrepancy between two distributions. For simplicity, we consider discrete distributions on \([\delta_1, \delta_2, \ldots, \delta_n]...
Entropy and mutual information
In statistical community, one of the primary estimation methods is maximum log-likelihood estimation (MLE). However, in machine learning/engineering, log-likelihood function has been renamed as cro...
Backpropagation of a vanilla RNN
This post investigates how to code up a vanilla RNN. Most of the code and example are copied from Andrej Karpathy’s blog: The Unreasonable Effectiveness of Recurrent Neural Networks. 100-line ...
Theoretical investigation of batch normalization
In last post, models with batch normalization have show advantages by a large margin. In this post, we will explore the following questions: How does batch normalization work? How is batch no...
Approximating Elliptic Paraboloid by Relu nets
If we regard neural net models as nonparametric approximators of continuous functions, several works are supporting the validity of this method. In particular, in reference 1, it has been proved...
Multiple comparisons
In statistics, multiple comparisons/multiple hypothesis testing occurs when one considers a set of statistical inference questions simultaneously. To control the chance of making mistakes when the ...
Sort multiple variables
Usually, we are proficient at sorting the data frame/table by one variable. But there are cases that we need a second variable to break the ties. In this post, I will summarize how to do this in Py...
Embedding algorithms 2 -- Locally linear embedding
Introduction In the last post, though the nonclassical/metric MDS is a nonlinear embedding algorithm, it is a global method (like PCA) since each point in the graph is related to all other \(n-1\)...