1. String cheat-sheet¶
(a) List the most important string processing functions we have used in the notes and give a one-sentence description in your own words for what it does
(b) give a small example demonstrating each function's use
2. string comparison¶
(a) Explain the difference between Hamming and Levenshtein distance (also commonly called edit distance) metrics.
(b) demonstrate an example where the Hamming and Levenshtein distances are different for a pair of strings. You can use the nltk.metrics package.
In [11]:
from nltk.metrics import * # packaging is incomplete with latest version. importing this way seems to work..
In [17]:
string1 = "hello " # must be same length for Hamming
string2 = "heello"
edit_distance(string1,string2) # Levenshtein builtin from nltk
Out[17]:
2
In [22]:
# have to compute Hamming from scratch apparently
hdist = 0
for k in range(0,len(string1)):
if string1[k] != string2[k]:
hdist = hdist+1
print(hdist)
3
3. NLP summary¶
List eight NLP tasks which we have covered in class including two relatively easy tasks and two very difficult tasks. Give your personal opinion on how they might be sorted in terms of easiest to most difficult to implement from scratch.