2021-02-04
Evaluation Metrics
Unsupervised metrics:
-
Mutual Information (MI): between the input image and its embedded low dimensional representation using MINE to estimate. However, the MI metrics used previously is not normalized, we can normalize it using such formula
Since Mutual information is the intersection, it is computed by \(MI(X,Y) = H(X) - H(X|Y)\), The normalized mutual information can be to obtain a scale of \([0, 1]\)
$$
NMI(X,Y) = \frac{2 \times I(Y;X)}{H(Y) + H(X)}
$$
-
Continuity
-
Trust worthiness
-
LCMC
Supervised metrics (require):
-
Modularity
Assumption: ideally, each embed dimension will have high mutual information with a single factor and zero mutual information with all other factors. Modularity measures if each dimension of the latent space depends on only one attribute.
Firstly measure the MI between each embed dimension and each truth factor. Then find the \(\theta_i = max(I_{ig})\), and define a template vector
$$
t_if =
\begin{cases}
\theta_{if} & \text{if f=$argmax_g(I_{g})$} \
0 & \text{otherwise}
\end{cases}
$$
The modularity can be computed as -
Mutual Information Gap (MIG Beta-TCVAE paper)
Estimate the MI between a latent variable \(z_j\) and a ground truth factor \(v_k\) using the joint distribution.
Single factor can have high mutual information with multiple latent variables. We enforce axis-alignment by measuring the difference between the top two latent variables with highest mutual information.
Note that MIG is normalized and have scale [0, 1];
-
Separated Attribute Predictability (SAP) from DIP-VAE Paper
Firstly, construct score matrix \(S_{dk}\) consist results from linear classification (predicting \(j^{th}\) factor using latent \(z_i\)), compute the residual score \(R^2\) measuring how well the line is fitted.
-
Spearman Correlation (SCC)
computes the maximum value of the SCC between an attribute and each dimension of the latent space.
-
Interpretability:
measure the ability to predict a given of each
From the k latent dimensions of \(z\), select \(z_i\space where\space i \leq k\) with maximal information about a truth factor \(v_j\)). \(i=argmax_i(v_j, z_i|x_t)\). Evaluate the interpretability \(z_i\)) to \(v_j\)) by measuring \(p(s_j|z_i)\), by summing the logarithms of the resulting probabilities corresponding to every test sample point for a dimension j of the side information.
By aggregating the scores over all the dimensions of the side information s, we get the interpretability score