1. Statistics with linear algebra¶
(a) Load the IRIS dataset from sklearn and standardize (subtract mean and scale by norm) the flower feature vectors (recall they are rows of the data matrix in this case). Use numpy to compute the mean and std and use broadcasting to subtract or multiply/divide as needed. Demonstrate the mean and std of rows of the standardized data are zero and one, respectively.
(b) Compute the covariance matrix of the dataset using matrix multiplication after you've subtracted the mean, and perhaps applied any appropriate scaling (it should agree with numpy.cov()). Display the covariance matrix using imshow. Can you see the different classes?
(c) using a similar matrix-matrix multiplication with your standardized data matrix, compute a matrix of correlations and display.
(d) compute the precision matrix and display it also. Explain the result. (Hint: you may need to use colorbar() to see what's going on)
2. Networkx¶
In class we used networkx to visualize a network made with a simple similarity metrix (inverse of distance) applied to the IRIS dataset.
Perform this visualization using a better similarity metric such as $\exp\left(\frac{-1}{2\sigma}\Vert\mathbf a - \mathbf b\Vert_2^2\right)$. Choose a value of $\sigma$ such that a few major clusters are identifiable. Explain what the result tells you, given what we know about the dataset.
3. Eigendecomposition practice¶
Use numpy for this exercise.
(a) Create a 4x4 diagonal matrix where the diagonal consists of random integers between -10 to 10. Compute the eigenvalue decomposition of this matrix, and in a comment identify the eigenvalues and eigenvectors (should be four scalars and four vectors)
(b) create a matrix $C$ which is the covariance matrix formed from the IRIS dataset (covariances between flowers). Compute the eigenvalue decomposition of $C$ and plot its eigenvalues. Do you know how this helps explain the precision matrix result above?
(b) Demonstrate how you can use the outputs of np.linalg.eig to reconstruct $C$ from its eigenvalue decomposition, i.e., $C = U@D@U^T$ for appropriately defined matrices. (hint: $D$ is a diagonal matrix with the eigenvalues on the diagonal, $U$ is the eigenvector matrix returned by eig).
Hint: due to numerical issues, your eigenvalues and/or eigenvectors may become slightly complex, meaning they have a very small imaginary part. This can cause trouble for plotting or imshow. You can remove this imaginary part with np.real().