UFC fight predictor 1-- scraping and exploratory analysis

Recently, I become a fan of Weili Zhang, the new UFC Women’s Strawweight Champion. She is a truly dedicated, confident and humble Chinese mixed martial artist. While looking into MMA, I am well awa...

Jun 14, 2020

Wasserstein Barycenter

Wasserstein Barycenter problem The Wasserstein Barycenter problem focuses on solving a weighted mean of a collection probability distributions such that the weighted Wasserstein distance is minimi...

Apr 30, 2020

Sinkhorn Algorithm

The Wasserstein distance The Wasserstein distance measures the discrepancy between two distributions. For simplicity, we consider discrete distributions on \([\delta_1, \delta_2, \ldots, \delta_n]...

Apr 27, 2020

Entropy and mutual information

In statistical community, one of the primary estimation methods is maximum log-likelihood estimation (MLE). However, in machine learning/engineering, log-likelihood function has been renamed as cro...

Jan 31, 2020

Backpropagation of a vanilla RNN

This post investigates how to code up a vanilla RNN. Most of the code and example are copied from Andrej Karpathy’s blog: The Unreasonable Effectiveness of Recurrent Neural Networks. 100-line ...

Jan 24, 2020

Theoretical investigation of batch normalization

In last post, models with batch normalization have show advantages by a large margin. In this post, we will explore the following questions: How does batch normalization work? How is batch no...

Jan 17, 2020

Approximating Elliptic Paraboloid by Relu nets

If we regard neural net models as nonparametric approximators of continuous functions, several works are supporting the validity of this method. In particular, in reference 1, it has been proved...

Jan 14, 2020

Multiple comparisons

In statistics, multiple comparisons/multiple hypothesis testing occurs when one considers a set of statistical inference questions simultaneously. To control the chance of making mistakes when the ...

Jan 6, 2020

Sort multiple variables

Usually, we are proficient at sorting the data frame/table by one variable. But there are cases that we need a second variable to break the ties. In this post, I will summarize how to do this in Py...

Dec 31, 2019

Embedding algorithms 2 -- Locally linear embedding

Introduction In the last post, though the nonclassical/metric MDS is a nonlinear embedding algorithm, it is a global method (like PCA) since each point in the graph is related to all other \(n-1\)...

Dec 27, 2019