Lauren's Blog

stop and smell the roses

Portfolio Assignment 1 January 25, 2009

Filed under: Data Mining — Lauren @ 12:44 am

After creating recommendations.py and running the commands on page 9 of “Collective Intelligence”, I got an error about recommendations not existing. I then re-read the page and moved recommendations.py to the Lib directory in Python. That fixed it right away. I love how easy Python makes it to use data structures like dictionaries and lists!

Euclidean Distance

Plugging in the Euclidean distance right into the Python interpreter (using IDLE) gave me the same answers as the example in the book with Toby and LaSalle. However, when I added the function sim_distance to recommendations.py I got a different answer for Lisa Rose and Gene Seymour. I added the squares of the differences by hand and got the same answer as my function. I think the general consensus is the book is wrong!

Pearson Coefficient

The Pearson coefficient worked correctly and yielded the same results as the book. It took me a while to understand how the function sim_pearson was operating like the formula we discussed in class but I worked through it.

Manhattan Distance

Implementing the Manhattan distance was pretty simple. I followed the same format as the sim_distance and sim_pearson functions. The formula for the Manhattan distance is |X1-X2|+|Y1-Y2|+…+|Z1-Z2|. I had to look up the syntax for an absolute value function in Python and it was what I thought it would be: abs(x). Below is my sim_manhattan function.

from math import sqrt

# Returns a distance-based similarity score for personA and personB

def sim_manhattan(prefs, personA, personB):

    # Get the list of shared_items

    si={}

    for item in prefs[personA]:

        if item in prefs[personB]:

            si[item]=1

    # if they have no ratings in common, return 0

    if len(si)==0: return 0

    # Add up the absolute values of all the differences

    sum_of_abs=sum([abs(prefs[personA][item]-prefs[personB][item])  for item in si])

    return sum_of_abs

When tested in the Python interpretor with the critics Lisa Rose and Gene Seymour, I got the following, correct result:

>>> reload(recommendations)

<module ‘recommendations’ from ‘C:\Python26\lib\recommendations.py’>

>>>recommendations.sim_manhattan(recommendations.critics,’Lisa Rose’, ‘Gene Seymour’)

4.5

Advertisements
 

2 Responses to “Portfolio Assignment 1”

  1. Bob Harkness Says:

    The book is wrong? I’d double check that if i were you.

  2. harkshark Says:

    I did double check, DAD! Our professor told us something was wrong with the example, and everyone in my class said the book was wrong too!


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s