1. Implementing normsĀ¶
(a) manually implement the 0, 1,2, and $\infty$ norms as functions which take a basic python list of arbitrary length as input and output the vector norm. Do not use numpy or any other implementation of norm functions (you can use python's math library).
(b) compare your functions to numpy's norm functions for a random vector of length 1000.
import math
v = [0, 1, 2, -1]
def norm_0(v):
norm = 0
for v_i in v:
if v_i!=0:
norm=norm+1
return norm
def norm_1(v):
norm = 0
for v_i in v:
norm=norm+abs(v_i)
return norm
def norm_2(v):
norm = 0
for v_i in v:
norm=norm+v_i**2
return math.sqrt(norm)
def norm_infty(v):
norm = 0
for v_i in v:
if abs(v_i)>norm:
norm=abs(v_i)
return norm
print('zero ', norm_0(v))
print('one ', norm_1(v))
print('two ', norm_2(v))
print('infty', norm_infty(v))
zero 3 one 4 two 2.449489742783178 infty 2
import numpy as np
v = np.random.randn(1000)
print('zero ', norm_0(v), np.linalg.norm(v,0))
print('one ', norm_1(v), np.linalg.norm(v,1))
print('two ', norm_2(v), np.linalg.norm(v,2))
print('infty', norm_infty(v), np.linalg.norm(v,np.inf))
zero 1000 1000.0 one 796.1214568992399 796.1214568992405 two 31.74376753851586 31.743767538515844 infty 3.765257353035628 3.765257353035628
2. Distances with different normsĀ¶
(a) Manually implement a function that takes a m-by-n dataset, a m-by-1 vector, and a norm function (you can pass functions as inputs in python just like numbers). Have your new function use the input norm to compute the distance over all data, then return that array of distances as the function output.
(b) Demonstrate the function using all four norms for an example dataset consisting of a 100-by-10 random matrix, treating rows as samples. Plot the distances between a random vector with 10 elements and every row of the data using each norm. You can use numpy to generate this data, and to compute the differences between vectors, but not to compute norms. For the random data, use random integers between -10 and 10.
(c) Point out using the graph which indices appear to be nearest to the random vector for each norm.
Hint: make sure your vectors are the same shape when taking differences to avoid broadcasting rather than computing distance.
Hint2: you may need to make use of .flatten() and/or .tolist() to handle numpy arrays as input to a function which expects a list.
from matplotlib.pyplot import *
D = np.random.randint(low=-5, high=5,size=(100,10))
v = np.random.randint(low=-5, high=5,size=(1,10))
def compute_distances(A,w,normfunc):
out = []
m,n = A.shape
for k_samp in range(0,m):
out.append(normfunc((A[k_samp].flatten()-w.flatten()).tolist()))
return out
plot(compute_distances(D,v,norm_0));
plot(compute_distances(D,v,norm_1));
plot(compute_distances(D,v,norm_2));
plot(compute_distances(D,v,norm_infty));
legend(('0','1','2','infty'));
3. Class comparisons with different normsĀ¶
Compute vectors of mean and standard deviation feature value for each of the (three) flower classes in the Iris dataset from sklearn (you can use numpy mean and std functions for this).
Using your distance function from #2, plot the distances between the feature vector and each of the class means, for every flower in the dataset.
From your plots, point out the samples for which the target label does not correspond to the nearest mean.
Repeat for the 1 and 2 norms.
Hint: plot iris.target to see which ranges of indices correspond to which classes
from sklearn import datasets
iris = datasets.load_iris()
dir(iris)
['DESCR', 'data', 'data_module', 'feature_names', 'filename', 'frame', 'target', 'target_names']
print(set(iris.target))
{0, 1, 2}
# compute mean and std for each class
k_class = 0
indices_k = iris.target==k_class
dat_k = iris.data[indices_k]
mean_0 = np.mean(dat_k,0)
std_0 = np.std(dat_k,0)
k_class = 1
indices_k = iris.target==k_class
dat_k = iris.data[indices_k]
mean_1 = np.mean(dat_k,0)
std_1 = np.std(dat_k,0)
k_class = 2
indices_k = iris.target==k_class
dat_k = iris.data[indices_k]
mean_2 = np.mean(dat_k,0)
std_2 = np.std(dat_k,0)
# now plot distances
plot(compute_distances(iris.data,mean_0, norm_1));
plot(compute_distances(iris.data,mean_1, norm_1));
plot(compute_distances(iris.data,mean_2, norm_1));
legend(('class 0','class 1','class 2'));
plot(compute_distances(iris.data,mean_0, norm_2));
plot(compute_distances(iris.data,mean_1, norm_2));
plot(compute_distances(iris.data,mean_2, norm_2));
legend(('class 0','class 1','class 2'));
plot(iris.target);