Lauren's Blog

stop and smell the roses

Portfolio Assignment 6 March 24, 2009

Filed under: Data Mining — Lauren @ 5:57 pm

Clustering Movies

For this assignment, my team (Andrew, Kurt, Will) and I tried to cluster a very large file with movie data. Once we got the text file to work in the readfile method, we ran it on my computer, waited, waited and waited. We knew ahead of time that it would take a while to cluster so we ran it during class (approximately 2.5 hours) and still nothing. 

Not knowing what to do next, I browsed my classmate’s blogs to see how they approached this movie data. The idea I tried next was to get rid of most of the column, except for two, and narrow the data down to 1,000 movies. I let it run for 5 minutes or so and finally got the Python command prompt back! But then I keep getting this error when I try to print the clusters out:

>>> movienames,categories,data=moviecluster.readfile(‘moviedata.txt’)

>>> clust = moviecluster.hcluster(data)

>>> moviecluster.printclust(clust,labels=movienames)

  Starman

Traceback (most recent call last):

  File “<pyshell#14>”, line 1, in <module>

    moviecluster.printclust(clust,labels=movienames)

  File “C:\Python26\moviecluster.py”, line 101, in printclust

    if clust.right!=None: printclust(clust.right,labels=labels,n=n+1)

  File “C:\Python26\moviecluster.py”, line 91, in printclust

    if clust.id<0:

AttributeError: ‘list’ object has no attribute ‘id’

>>> 

Advertisements
 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s