Portfolio Assignment 6 March 24, 2009

Clustering Movies

For this assignment, my team (Andrew, Kurt, Will) and I tried to cluster a very large file with movie data. Once we got the text file to work in the readfile method, we ran it on my computer, waited, waited and waited. We knew ahead of time that it would take a while to cluster so we ran it during class (approximately 2.5 hours) and still nothing. 

Not knowing what to do next, I browsed my classmate’s blogs to see how they approached this movie data. The idea I tried next was to get rid of most of the column, except for two, and narrow the data down to 1,000 movies. I let it run for 5 minutes or so and finally got the Python command prompt back! But then I keep getting this error when I try to print the clusters out:

>>> movienames,categories,data=moviecluster.readfile(‘moviedata.txt’)

>>> clust = moviecluster.hcluster(data)

>>> moviecluster.printclust(clust,labels=movienames)


Traceback (most recent call last):

  File “<pyshell#14>”, line 1, in <module>


  File “C:\Python26\”, line 101, in printclust

    if clust.right!=None: printclust(clust.right,labels=labels,n=n+1)

  File “C:\Python26\”, line 91, in printclust


AttributeError: ‘list’ object has no attribute ‘id’



