BDS 761: Data Science and Machine Learning I
Topic 1: Introduction
I. Class topics¶
Catalog description¶
- Data wrangling
- Dynamic data visualization
- Reproducible research
- Applied machine learning
Course content delivered through lectures and hands-on lab instruction.
General Topic List¶
We will focus on methods and tools in the following broad areas
- Text processing
- Matrix algebra methods and software
- Introduction to Machine Learning and Deep Learning
- Natural Language Processing
Objectives of this class¶
- Be able to use "core" python libraries in your research
- Understand how SOTA A.I. methods are broadly based on these same libraries
- Be able to implement basic processing and a few machine learning algorithms from "scratch"
- Generally understand what is going on in research publications
II. Syllabus Discussion¶
- Homework and readings will be provided at end of class or via announcement later that evening. Due at beginning of following class. Points deducted if show up late.
- No particular textbook needed
- A computer is needed to participate in class.
- Attendance not mandatory (?). Will attempt to record classes. Please do not come to class with anything contagious.
- Academic integrity - can discuss verbally. Do not share work or copy fellow students' writing or code. Be very careful about basing your work on code from internet.
- Office hours TBD.
Course Information¶
- Labs/Participation/Homework - 20%
- Midterm - 30%
- Final Exam - 30%
- Project - 20%
Point of lab/participation/homework is to encourage you to study and learn. Easy points.
Point of exams is to decide your grade.
Will discuss project later. Basically it will be a more complete version of a lab project, including validation and writeup. And a poster session defending your analysis.
Prerequisites: programming + math¶
Programming skills necessary. We will be using Python. If the amount of work seems to be overwhelming, it is most likely due to a deficiency here.
Vector geometry
Matrix Algebra
Prob & Stat won't be used much
Prereqs exist for very good reasons.
Books¶
There is no required text. There is a vast supply of free resources online. Suggestions:
Introduction to Applied Linear Algebra, Boyd & Vendenberghe 2018, http://vmls-book.stanford.edu/
Hands-On Machine Learning with Scikit-Learn and TensorFlow, Concepts, Tools, and Techniques to Build Intelligent Systems, 2e, Géron 2019
Deep Learning with Python, 2e, Chollet 2021
Speech and Language Processing, 3e, Jurafsky & Martin 2024. https://web.stanford.edu/~jurafsky/slp3/
Academic Integrity, etc.¶
- See student handbook. This is your contract.
- Fairness will not be sacrificied for other noble causes
- Big source of drama: students skipping class or not doing homework then being unhappy with exams they could not handle as a result
III. Software Installation¶
Jupyter - "notebooks" for inline code + LaTex math + markup, etc.¶
A single document containing a series of "cells". Each containing code which can be run, or images and other documentation.
- Run a cell via
[shift] + [Enter]
or "play" button in the menu.
Will execute code and display result below, or render markup etc.
Can also use R or Julia (easily), Matlab, SQL, etc. (with increasing difficulty).
import datetime
print("This code is run right now (" + str(datetime.datetime.now()) + ")")
'hi'
This code is run right now (2020-01-22 18:31:02.681214)
'hi'
x=1+2+2
print(x)
5
import numpy as np
np.random.randn(2,5)
array([[ 1.24350758, 1.99906955, -0.3226366 , -0.98266019, -0.1309466 ], [-0.85026968, -0.35865037, 0.70637075, 1.06492839, 0.35220974]])
np.ones((2,2))
array([[1., 1.], [1., 1.]])
Installation¶
First project: get Jupyter running and be able to import listed tools
Easiest to install via Anaconda. Preferrably Python 3.
https://www.anaconda.com/download/
Highly recomended to make a separate environment for class - hot open source tools change fast and deprecate (i.e. break) old features constantly
conda install jupyter matplotlib numpy scipy scikit-learn pandas ...
Many other packages...
Python Help Tips¶
- Get help on a function or object via
[shift] + [tab]
after the opening parenthesisfunction(
- Can also get help by executing
function?
IV. Q & A Discussion¶
- Virtual vs In-person classes?
- Job interests/plans?
- Research topics?