print ("dosage = "+str(np.round(dosage,2)))
print ("blood concentration = "+str(np.round(conc_noisy,2)))

dosage = [   0.    166.67  333.33  500.    666.67  833.33 1000.  ]
blood concentration = [33.52 42.77 -9.23 12.1  36.8  69.32 59.61]

poly_ex_1 = np.polyfit(dosage,conc_noisy,1)
show_polyfit_example_results(dosage, conc_true,conc_noisy,poly_ex_1,1)

true intercept: 10.0, est: 15.973; true slope: 0.05, est: 0.038

dosage = np.array([0., 166.67, 333.33, 500., 666.67, 833.33, 1000.])
blood_concentration = np.array([35.07, 3.18, 26.43, 23.36, 47.93, 72.21, 62.78])

plt.plot(dosage, blood_concentration,'o-')
slope = 5 # a.k.a. beta_1, try and find better values than this...
intercept = 20  # a.k.a. beta_0, try and find better values than this...
plt.plot(dosage, slope*dosage+intercept) 
plt.xlabel('dosage (a.k.a. x)')
plt.ylabel('blood concentration (a.k.a. y)');

show_polyfit_example_results(dosage, conc_true,conc_noisy,poly_ex_1,1)

true intercept: 20, est: 15.973; true slope: 5, est: 0.038

import numpy as np
from matplotlib import pyplot as plt
x = np.linspace(0, 10, 100)
y = 2*x+3
s = 0.4

plt.plot(x, y, 'k-');
plt.plot([1, 1, 2, 3, 3, 4, 4, 5, 6, 7],[7, 3, 12, 5, 11, 15, 10, 15, 12, 19], 'yo', markersize=7);
plt.plot([1, 1], [7-s, 5], 'b-', [1, 1], [3+s, 5], 'b-', [2, 2], [12-s, 7], 'b-', [3, 3], [5+s, 9], 'b-', [3, 3], [11-s, 9], 'k-', [4,4], [15-s, 11], 'b-',          [4,4], [10+s, 11], 'b-', [5, 5], [15-s, 13], 'b-', [6, 6], [12+s+0.1, 15], 'b-', [7, 7], [19-s, 17], 'b-', linestyle='dashed');

C:\Users\micro\AppData\Local\Temp\ipykernel_34436\2661373173.py:9: UserWarning: linestyle is redundantly defined by the 'linestyle' keyword argument and the fmt string "b-" (-> linestyle='-'). The keyword argument will take precedence.
  plt.plot([1, 1], [7-s, 5], 'b-', [1, 1], [3+s, 5], 'b-', [2, 2], [12-s, 7], 'b-', [3, 3], [5+s, 9], 'b-', [3, 3], [11-s, 9], 'k-', [4,4], [15-s, 11], 'b-',          [4,4], [10+s, 11], 'b-', [5, 5], [15-s, 13], 'b-', [6, 6], [12+s+0.1, 15], 'b-', [7, 7], [19-s, 17], 'b-', linestyle='dashed');
C:\Users\micro\AppData\Local\Temp\ipykernel_34436\2661373173.py:9: UserWarning: linestyle is redundantly defined by the 'linestyle' keyword argument and the fmt string "k-" (-> linestyle='-'). The keyword argument will take precedence.
  plt.plot([1, 1], [7-s, 5], 'b-', [1, 1], [3+s, 5], 'b-', [2, 2], [12-s, 7], 'b-', [3, 3], [5+s, 9], 'b-', [3, 3], [11-s, 9], 'k-', [4,4], [15-s, 11], 'b-',          [4,4], [10+s, 11], 'b-', [5, 5], [15-s, 13], 'b-', [6, 6], [12+s+0.1, 15], 'b-', [7, 7], [19-s, 17], 'b-', linestyle='dashed');

print ("blood conc (measured) = "+str(np.round(conc_noisy,2)))
print ("blood conc (model) = "+str(np.round(np.polyval(poly_ex_1,dosage),2)))
print ("difference = "+str(np.round(conc_noisy - np.polyval(poly_ex_1,dosage),2)))

plt.scatter(dosage, conc_noisy , color='red')
plt.scatter(dosage, np.polyval(poly_ex_1,dosage) , color='black');
plt.plot(dosage, np.polyval(poly_ex_1,dosage) , color='black', linewidth="1");

blood conc (measured) = [33.52 42.77 -9.23 12.1  36.8  69.32 59.61]
blood conc (model) = [15.97 22.31 28.65 34.98 41.32 47.65 53.99]
difference = [ 17.54  20.46 -37.88 -22.88  -4.52  21.66   5.62]

show_polyfit_example_results(dosage, conc_true,conc_noisy,poly_ex_1,1)

true intercept: 20, est: 15.973; true slope: 5, est: 0.038

from sklearn.datasets import load_boston
boston = load_boston()
print(dir(boston))
print(boston.data.shape)
print(boston.feature_names)
print(boston.DESCR)

['DESCR', 'data', 'feature_names', 'filename', 'target']
(506, 13)
['CRIM' 'ZN' 'INDUS' 'CHAS' 'NOX' 'RM' 'AGE' 'DIS' 'RAD' 'TAX' 'PTRATIO'
 'B' 'LSTAT']
.. _boston_dataset:

Boston house prices dataset
---------------------------

**Data Set Characteristics:**  

    :Number of Instances: 506 

    :Number of Attributes: 13 numeric/categorical predictive. Median Value (attribute 14) is usually the target.

    :Attribute Information (in order):
        - CRIM     per capita crime rate by town
        - ZN       proportion of residential land zoned for lots over 25,000 sq.ft.
        - INDUS    proportion of non-retail business acres per town
        - CHAS     Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
        - NOX      nitric oxides concentration (parts per 10 million)
        - RM       average number of rooms per dwelling
        - AGE      proportion of owner-occupied units built prior to 1940
        - DIS      weighted distances to five Boston employment centres
        - RAD      index of accessibility to radial highways
        - TAX      full-value property-tax rate per $10,000
        - PTRATIO  pupil-teacher ratio by town
        - B        1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
        - LSTAT    % lower status of the population
        - MEDV     Median value of owner-occupied homes in $1000's

    :Missing Attribute Values: None

    :Creator: Harrison, D. and Rubinfeld, D.L.

This is a copy of UCI ML housing dataset.
https://archive.ics.uci.edu/ml/machine-learning-databases/housing/


This dataset was taken from the StatLib library which is maintained at Carnegie Mellon University.

The Boston house-price data of Harrison, D. and Rubinfeld, D.L. 'Hedonic
prices and the demand for clean air', J. Environ. Economics & Management,
vol.5, 81-102, 1978.   Used in Belsley, Kuh & Welsch, 'Regression diagnostics
...', Wiley, 1980.   N.B. Various transformations are used in the table on
pages 244-261 of the latter.

The Boston house-price data has been used in many machine learning papers that address regression
problems.   
     
.. topic:: References

   - Belsley, Kuh & Welsch, 'Regression diagnostics: Identifying Influential Data and Sources of Collinearity', Wiley, 1980. 244-261.
   - Quinlan,R. (1993). Combining Instance-Based and Model-Based Learning. In Proceedings on the Tenth International Conference of Machine Learning, 236-243, University of Massachusetts, Amherst. Morgan Kaufmann.

show_polyfit_ho_example_results(dosage,conc_noisy,(1,2,3,4,5,6));

order: 1, residual: 56.6204627002199
order: 2, residual: 43.28483436298878
order: 3, residual: 38.90189854595423
order: 4, residual: 21.166171682768383
order: 5, residual: 13.416800916508517
order: 6, residual: 1.8203750586044797e-11

x = np.linspace(-10,10,100)

plt.figure(figsize=(8,2))
plt.plot(x,1/(1+np.exp(-1*x)), linewidth='1.7');
plt.plot(x,1/(1+np.exp(-2*x)), linewidth='1.7');
plt.plot(x,1/(1+np.exp(-1*(x+5))), linewidth='1.7');

plt.ylim(-0.001,1.01);
plt.legend((r'$\frac{1}{1+e^{-x}} = \sigma(x)$', r'$\frac{1}{1+e^{-2x}} = \sigma(2x)$', r'$\frac{1}{1+e^{-(x-5)}} = \sigma(x-5)$'), fontsize=14);
plt.grid();

from sklearn.datasets import load_boston
boston = load_boston()
print(dir(boston))
print(boston.data.shape)
print(boston.feature_names)
print(boston.DESCR)

['DESCR', 'data', 'feature_names', 'filename', 'target']
(506, 13)
['CRIM' 'ZN' 'INDUS' 'CHAS' 'NOX' 'RM' 'AGE' 'DIS' 'RAD' 'TAX' 'PTRATIO'
 'B' 'LSTAT']
.. _boston_dataset:

Boston house prices dataset
---------------------------

**Data Set Characteristics:**  

    :Number of Instances: 506 

    :Number of Attributes: 13 numeric/categorical predictive. Median Value (attribute 14) is usually the target.

    :Attribute Information (in order):
        - CRIM     per capita crime rate by town
        - ZN       proportion of residential land zoned for lots over 25,000 sq.ft.
        - INDUS    proportion of non-retail business acres per town
        - CHAS     Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
        - NOX      nitric oxides concentration (parts per 10 million)
        - RM       average number of rooms per dwelling
        - AGE      proportion of owner-occupied units built prior to 1940
        - DIS      weighted distances to five Boston employment centres
        - RAD      index of accessibility to radial highways
        - TAX      full-value property-tax rate per $10,000
        - PTRATIO  pupil-teacher ratio by town
        - B        1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
        - LSTAT    % lower status of the population
        - MEDV     Median value of owner-occupied homes in $1000's

    :Missing Attribute Values: None

    :Creator: Harrison, D. and Rubinfeld, D.L.

This is a copy of UCI ML housing dataset.
https://archive.ics.uci.edu/ml/machine-learning-databases/housing/


This dataset was taken from the StatLib library which is maintained at Carnegie Mellon University.

The Boston house-price data of Harrison, D. and Rubinfeld, D.L. 'Hedonic
prices and the demand for clean air', J. Environ. Economics & Management,
vol.5, 81-102, 1978.   Used in Belsley, Kuh & Welsch, 'Regression diagnostics
...', Wiley, 1980.   N.B. Various transformations are used in the table on
pages 244-261 of the latter.

The Boston house-price data has been used in many machine learning papers that address regression
problems.   
     
.. topic:: References

   - Belsley, Kuh & Welsch, 'Regression diagnostics: Identifying Influential Data and Sources of Collinearity', Wiley, 1980. 244-261.
   - Quinlan,R. (1993). Combining Instance-Based and Model-Based Learning. In Proceedings on the Tenth International Conference of Machine Learning, 236-243, University of Massachusetts, Amherst. Morgan Kaufmann.

Unstructured Data & Natural Language Processing

Topic 10: Regression

Projects¶

This topic:¶

Recall: Joint Distributions¶

Using Structure¶

Fitting a model¶

Example¶

1st order Polynomial Fit¶

Exercise 0: Literally do this in Python¶

How well fit?¶

Residual¶

Simple Linear Regression¶

The Noise $\boldsymbol\varepsilon$¶

Answers:¶

Residual on Data¶

Ordinary Least Squares - minimizes the residual¶

Exercise:¶

Coefficient of Determination $R^2$¶

Exercise¶

Apply it¶

Better fit means better model... right?¶

GENERALIZATION¶

Caution¶

Cross-Validation¶

Multi-variable Regression¶

Recall Simple Linear Regression¶

Multi-variable Model¶

Multi-variable Model, Matrix-style $x_{(j),i} \rightarrow X_{i,j}$¶

Pesky Offset Term¶

Exercise¶

Statistical model¶

Model for an entire dataset, $i=1...M$¶

Maximum Likelihood estimate¶

The Art of Optimization¶

Optimization aside¶

Linear Algebra view of Regression¶

Linear systems¶

Exercise: networks¶

Solving Linear System with the SVD¶

Regression Error Metric Recap¶

Machine Learning in extremely-small nutshell¶

Lab¶

Maximum a Posteriori (MAP) Estimation¶

Regression as Optimization¶

Take home Messages¶

Maximum a Posterior (MAP) Estimation II: "Laplace prior"¶

FYI: Full-on Bayesian Inference¶

Take home Messages (updated)¶

Regression as Optimization (updated)¶

Nonlinear regression¶

Nonlinear regression (easy way) - Linear regression after nonlinear transformation¶

Polynomial regression - simple powers of data¶

Polynomial regression¶

Logistic regression¶

Logistic Regression: Motivation¶

Poorly fitting Linear Regression Model¶

Logistic Regression - fit sigmoid curve instead of line¶

Sigmoid, a.k.a. Standard Logistic Function¶

Picture it: Classification versus Regression¶

Lab¶

Quick Recap: Regression¶

Regression for Classification¶

Statistical View (very quickly)¶

Finding optimal parameters $\boldsymbol\beta$¶

Gradient Descent¶

Recap: Models¶

Deep neural network¶