Lauren's Blog

stop and smell the roses

Portfolio Assignment 1 January 25, 2009

Filed under: Data Mining — Lauren @ 12:44 am

After creating recommendations.py and running the commands on page 9 of “Collective Intelligence”, I got an error about recommendations not existing. I then re-read the page and moved recommendations.py to the Lib directory in Python. That fixed it right away. I love how easy Python makes it to use data structures like dictionaries and lists!

Euclidean Distance

Plugging in the Euclidean distance right into the Python interpreter (using IDLE) gave me the same answers as the example in the book with Toby and LaSalle. However, when I added the function sim_distance to recommendations.py I got a different answer for Lisa Rose and Gene Seymour. I added the squares of the differences by hand and got the same answer as my function. I think the general consensus is the book is wrong!

Pearson Coefficient

The Pearson coefficient worked correctly and yielded the same results as the book. It took me a while to understand how the function sim_pearson was operating like the formula we discussed in class but I worked through it.

Manhattan Distance

Implementing the Manhattan distance was pretty simple. I followed the same format as the sim_distance and sim_pearson functions. The formula for the Manhattan distance is |X1-X2|+|Y1-Y2|+…+|Z1-Z2|. I had to look up the syntax for an absolute value function in Python and it was what I thought it would be: abs(x). Below is my sim_manhattan function.

from math import sqrt

# Returns a distance-based similarity score for personA and personB

def sim_manhattan(prefs, personA, personB):

    # Get the list of shared_items

    si={}

    for item in prefs[personA]:

        if item in prefs[personB]:

            si[item]=1

    # if they have no ratings in common, return 0

    if len(si)==0: return 0

    # Add up the absolute values of all the differences

    sum_of_abs=sum([abs(prefs[personA][item]-prefs[personB][item])  for item in si])

    return sum_of_abs

When tested in the Python interpretor with the critics Lisa Rose and Gene Seymour, I got the following, correct result:

>>> reload(recommendations)

<module ‘recommendations’ from ‘C:\Python26\lib\recommendations.py’>

>>>recommendations.sim_manhattan(recommendations.critics,’Lisa Rose’, ‘Gene Seymour’)

4.5

 

CPSC 470: Data Mining January 23, 2009

Filed under: Data Mining — Lauren @ 5:51 pm

Spring 2009

A hands-on introductory course on data mining and information retrieval.

http://www.zacharski.org/classes/2009/spring/cs470u/index.php