<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>Lauren&#039;s Blog</title>
	<atom:link href="http://harkshark.wordpress.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://harkshark.wordpress.com</link>
	<description>stop and smell the roses</description>
	<lastBuildDate>Wed, 29 Jul 2009 15:53:40 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='harkshark.wordpress.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://s2.wp.com/i/buttonw-com.png</url>
		<title>Lauren&#039;s Blog</title>
		<link>http://harkshark.wordpress.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://harkshark.wordpress.com/osd.xml" title="Lauren&#039;s Blog" />
	<atom:link rel='hub' href='http://harkshark.wordpress.com/?pushpress=hub'/>
		<item>
		<title>Disable Balloon Tips</title>
		<link>http://harkshark.wordpress.com/2009/07/29/disable-balloon-tips/</link>
		<comments>http://harkshark.wordpress.com/2009/07/29/disable-balloon-tips/#comments</comments>
		<pubDate>Wed, 29 Jul 2009 15:51:25 +0000</pubDate>
		<dc:creator>Lauren</dc:creator>
				<category><![CDATA[Windows XP]]></category>

		<guid isPermaLink="false">http://harkshark.wordpress.com/?p=91</guid>
		<description><![CDATA[I hate clutter in my System Tray. I hate when software runs when Windows starts and adds another icon. I especially hate the annoying information balloons that pop up all the time! Why do I need a balloon notifying me that &#8220;a network cable has been disconnected&#8221; when the icon changes to a computer with [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=harkshark.wordpress.com&amp;blog=6292602&amp;post=91&amp;subd=harkshark&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>I hate clutter in my System Tray. I hate when software runs when Windows starts and adds another icon. I especially hate the annoying information balloons that pop up all the time! Why do I need a balloon notifying me that &#8220;a network cable has been disconnected&#8221; when the icon changes to a computer with a big, red X on it? Or the &#8220;Found new hardware&#8221; balloon every time you plug in your flash drive? I would be OK with them if they eventually went away like the Outlook Desktop Alert notifications, but you have to click the balloons to make them disappear! To much clicking!</p>
<p>I&#8217;m all about customization so I wanted to get rid of them asap. I checked all the Windows Display Settings and Taskbar Settings and didn&#8217;t find anything for balloon tips so I Googled. Here are my Google Search Results: <a href="http://www.google.com/search?hl=en&amp;ei=l1xwSt7VMonKtgfR5vH9DQ&amp;sa=X&amp;oi=spell&amp;resnum=0&amp;ct=result&amp;cd=1&amp;q=get+rid+of+windows+balloons&amp;spell=1" target="_blank">&#8220;get rid of windows balloons&#8221;</a>.</p>
<p>The first two hits gave me what I wanted. Since this solution involves editing the Windows Registry, I checked a couple more articles to make sure the solutions were similar. I don&#8217;t like to mess with the operating system but I had to get rid of those balloons! The <a href="http://www.pcmag.com/article2/0,2817,7449,00.asp" target="_blank">PCMAG</a> article is the one I followed initially:</p>
<p>Get Rid of Those Pesky Balloons!</p>
<ol style="padding-left:30px;">
<li>From the Start button select Run (Windows Logo + R)</li>
<li>Type regedit and hit Enter to open the Registry Editor</li>
<li>Go to HKEY_CURRENT_USER → software → microsoft → windows → currentversion → explorer → advanced</li>
<li>Under Edit select New → DWORD Value</li>
<li>Type EnableBalloonTips and hit Enter</li>
<li>Close the Registry Editor and Log Out/Log In again to enable the change</li>
</ol>
<p>At first I was confused why I was typing <strong>Enable</strong>BalloonTips when I want to <strong>Disable</strong> them but reading the <a href="http://www.windowsnetworking.com/kbase/WindowsTips/WindowsXP/RegistryTips/Commandshell/GetRidOfThoseBalloons.html" target="_blank">WindowsNetworking</a> article I learned that assigning the value to 0 would disable it. <em>To enable the balloon tips set the value to 1.</em></p>
<p>After I logged back in, I tested it out by unplugging my Ethernet cable. No balloon! Thanks PCMag :)</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/harkshark.wordpress.com/91/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/harkshark.wordpress.com/91/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/harkshark.wordpress.com/91/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/harkshark.wordpress.com/91/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/harkshark.wordpress.com/91/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/harkshark.wordpress.com/91/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/harkshark.wordpress.com/91/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/harkshark.wordpress.com/91/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/harkshark.wordpress.com/91/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/harkshark.wordpress.com/91/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/harkshark.wordpress.com/91/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/harkshark.wordpress.com/91/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/harkshark.wordpress.com/91/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/harkshark.wordpress.com/91/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=harkshark.wordpress.com&amp;blog=6292602&amp;post=91&amp;subd=harkshark&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://harkshark.wordpress.com/2009/07/29/disable-balloon-tips/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/b6d96e7ed084c5af323a8b3e0f6d7436?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">harkshark</media:title>
		</media:content>
	</item>
		<item>
		<title>Portfolio Assignment 9</title>
		<link>http://harkshark.wordpress.com/2009/04/23/portfolio-assignment-9/</link>
		<comments>http://harkshark.wordpress.com/2009/04/23/portfolio-assignment-9/#comments</comments>
		<pubDate>Thu, 23 Apr 2009 04:42:26 +0000</pubDate>
		<dc:creator>Lauren</dc:creator>
				<category><![CDATA[Data Mining]]></category>

		<guid isPermaLink="false">http://harkshark.wordpress.com/?p=86</guid>
		<description><![CDATA[Final Project For our final project, my team (Andrew, Kurt, Will) and I will be expanding on our work from last week with document filtering. The Problem As you well know, spam is a very annoying and persistant presence on the Internet. In chapter 6 of PCI, we learned that rule-based classifier don&#8217;t cut it [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=harkshark.wordpress.com&amp;blog=6292602&amp;post=86&amp;subd=harkshark&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><strong><span style="color:#ff6600;">Final Project</span></strong></p>
<p>For our final project, my team (Andrew, Kurt, Will) and I will be expanding on our work from last week with document filtering.</p>
<p><strong>The Problem</strong></p>
<p>As you well know, spam is a very annoying and persistant presence on the Internet. In chapter 6 of PCI, we learned that rule-based classifier don&#8217;t cut it because spammers are getting smarter. So, we created a learning classifier that is trained on data and gives a document a category depending on word or feature probabilities. The only guidelines for our project was to use a substantial dataset. The algorithm in the book uses strings as &#8220;documents&#8221;. We want to use real email documents to train the classifier and use it for future classifications.</p>
<p><strong>The Data</strong></p>
<p>At first, we searched the Internet for some fun spam datasets to download. Of course, there were a ton! But the way we planned to modify the classifying algorithm in the book was to use email text and we kept finding weird formats for the datasets. So, Will logged into his old Yahoo email account and found 1,400 spam emails. I&#8217;m pretty sure if I logged into my old AOL account I would find a similar number! At first we thought we were going to have to use the <span style="color:#ff6600;">sampletrain </span>method from the book and type the name of every file into a line of code. That would take forever and make the algorithm not very realistic in real life. Will whipped up a function to rename all of his emails into a format of either <span style="color:#ff6600;">spam#.txt</span> or <span style="color:#ff6600;">nonspam#.txt</span>:</p>
<p style="padding-left:30px;"><span style="color:#ff6600;">def </span><span style="color:#0000ff;">openfiles</span>(cl):</p>
<p style="padding-left:30px;">    data = <span style="color:#800080;">open</span>(<span style="color:#008000;">&#8216;blogsplogreal.txt&#8217;</span>,<span style="color:#008000;"> &#8216;r&#8217;</span>)</p>
<p style="padding-left:30px;">    lines = data.readlines();</p>
<p style="padding-left:30px;">    for i <span style="color:#ff6600;">in </span>lines:</p>
<p style="padding-left:30px;">        thisline = i.split(<span style="color:#008000;">&#8221; &#8220;</span>);</p>
<p style="padding-left:30px;">        filename = thisline[1];</p>
<p style="padding-left:30px;">        <span style="color:#800080;">print </span><span style="color:#008000;">&#8216;opening: &#8216;</span> + filename;</p>
<p style="padding-left:30px;">        <span style="color:#ff6600;">if </span>thisline[2] == <span style="color:#008000;">&#8220;1\n&#8221;</span>:</p>
<p style="padding-left:30px;">            spamtype =<span style="color:#008000;"> &#8216;spam&#8217;</span>;</p>
<p style="padding-left:30px;">        <span style="color:#ff6600;">else</span>:</p>
<p style="padding-left:30px;">            spamtype =<span style="color:#008000;"> &#8216;not-spam&#8217;</span>;</p>
<p style="padding-left:30px;">        <span style="color:#800080;">print </span><span style="color:#008000;">&#8216;file type: &#8216; </span>+ spamtype;</p>
<p style="padding-left:30px;">        cl.train(filename, spamtype);</p>
<p>This was useful because we created a loop to train the classifier by concatenating the basename of the file (spam or nonspam), the number and &#8216;.txt&#8217;:</p>
<p style="padding-left:30px;"><span style="color:#ff6600;">def </span><span style="color:#0000ff;">sampletrain</span>(cl,basefile,numfiles,gory):</p>
<p style="padding-left:30px;">    <span style="color:#ff6600;">for </span>i <span style="color:#ff6600;">in </span><span style="color:#800080;">range</span>(1,numfiles+1):</p>
<p style="padding-left:30px;">        filename =  basefile + <span style="color:#800080;">str</span>(i) +<span style="color:#008000;"> &#8216;.txt&#8217;</span></p>
<p style="padding-left:30px;">        <span style="color:#ff0000;">#print filename</span></p>
<p style="padding-left:30px;">        cl.train(filename,gory)</p>
<p><strong>The Solution</strong></p>
<p>We used Bayes (when in doubt, use Bayes!) to train and classify documents. Starting off the books code, we had to edit the <span style="color:#ff6600;">getwords </span>method to open a file and add the words to the dictionary. A friendly neighbor in the lab showed us how to do file I/O and this is what we came up with:</p>
<p> </p>
<p style="padding-left:30px;"><span style="color:#ff6600;">def </span><span style="color:#0000ff;">getwords</span>(doc):</p>
<p style="padding-left:30px;">    data = <span style="color:#800080;">open</span>(doc, <span style="color:#008000;">&#8216;r&#8217;</span>)</p>
<p style="padding-left:30px;">    lines = <span style="color:#008000;">&#8216; &#8216;</span></p>
<p style="padding-left:30px;">    <span style="color:#ff6600;">for </span>line <span style="color:#ff6600;">in </span>data:</p>
<p style="padding-left:30px;">        lines+=line</p>
<p style="padding-left:30px;"><span style="color:#ff0000;">    #print lines</span></p>
<p style="padding-left:30px;">    splitter = re.compile(<span style="color:#008000;">&#8216;\\W*&#8217;</span>)</p>
<p style="padding-left:30px;"><span style="color:#ff0000;">    # Split the words by non-alpha characters</span></p>
<p style="padding-left:30px;">    words = [s.lower() <span style="color:#ff6600;">for </span>s <span style="color:#ff6600;">in </span>splitter.split(lines) <span style="color:#ff6600;">if </span><span style="color:#800080;">len</span>(s) &gt; 2 <span style="color:#ff6600;">and </span><span style="color:#800080;">len</span>(s) &lt; 20]</p>
<p style="padding-left:30px;">    <span style="color:#ff0000;">#print words</span></p>
<p style="padding-left:30px;"><span style="color:#ff0000;">    # Return the unique set of words only</span></p>
<p style="padding-left:30px;">    <span style="color:#ff6600;">return </span>dict([(w,1) <span style="color:#ff6600;">for </span>w <span style="color:#ff6600;">in </span>words])</p>
<p>This opens a file and concatenates each line into one big string. Then it is split up and converted to lowercase as the book does. Now, we can send the classifier a filename and it will get the features and resume with the same algorithm.</p>
<p><strong>The Results</strong></p>
<p>We were very excited to see that our modifications to allow files to be trained compiled! It took some serious looking at the code to make sure we were doing it correctly, but now we really understand what is going on. Starting small, we trained two documents, one that was spam and one that was not. It correctly added the categories and features to the dictionary- success number 1! Then we trained a few more documents and gave it a unknown document to classify and it worked! It classified 100% of 4 document correctly. Now that we know it works, we ran the algorithm with Will&#8217;s mixed spam and nonspam files. Tomorrow we&#8217;re going to run it with a combination of unknown document and see how it classifies in front of the class.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/harkshark.wordpress.com/86/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/harkshark.wordpress.com/86/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/harkshark.wordpress.com/86/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/harkshark.wordpress.com/86/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/harkshark.wordpress.com/86/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/harkshark.wordpress.com/86/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/harkshark.wordpress.com/86/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/harkshark.wordpress.com/86/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/harkshark.wordpress.com/86/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/harkshark.wordpress.com/86/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/harkshark.wordpress.com/86/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/harkshark.wordpress.com/86/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/harkshark.wordpress.com/86/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/harkshark.wordpress.com/86/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=harkshark.wordpress.com&amp;blog=6292602&amp;post=86&amp;subd=harkshark&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://harkshark.wordpress.com/2009/04/23/portfolio-assignment-9/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/b6d96e7ed084c5af323a8b3e0f6d7436?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">harkshark</media:title>
		</media:content>
	</item>
		<item>
		<title>Portfolio Assignment 8</title>
		<link>http://harkshark.wordpress.com/2009/04/11/portfolio-assignment-8/</link>
		<comments>http://harkshark.wordpress.com/2009/04/11/portfolio-assignment-8/#comments</comments>
		<pubDate>Sat, 11 Apr 2009 20:53:56 +0000</pubDate>
		<dc:creator>Lauren</dc:creator>
				<category><![CDATA[Data Mining]]></category>

		<guid isPermaLink="false">http://harkshark.wordpress.com/?p=71</guid>
		<description><![CDATA[PCI Chapter 6 &#8211; Document Filtering Next class, my team (Will, Andrew and Kurt) and I will be presenting Chapter 6 in PCI.  We have divided the chapter into major sections, and I will be discussing the first three, starting with Filtering Spam. Filtering Spam Why do we need to classify documents based on their contents? To [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=harkshark.wordpress.com&amp;blog=6292602&amp;post=71&amp;subd=harkshark&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><strong><span style="color:#008000;">PCI Chapter 6 &#8211; Document Filtering</span></strong></p>
<p>Next class, my team (Will, Andrew and Kurt) and I will be presenting Chapter 6 in PCI.  We have divided the chapter into major sections, and I will be discussing the first three, starting with Filtering Spam.</p>
<p><strong>Filtering Spam</strong></p>
<p>Why do we need to classify documents based on their contents? To eliminate spam! For my own Gmail account, I use many rule-based, spam-elminating, organizational methods. But, as the chapter agrees, this isn&#8217;t a perfect approach. Because I use word matching, sometimes an email that is destined for the &#8220;annoying UMW administration email&#8221; folder, ends up in my inbox (which I promptly delete). I also use email address matching to filter my messages, but some addresses could go into more than one folder, depending on the content of the message. So, what to do? How about programs that learn based on what you tell them is spam, and isn&#8217;t and continues to do this not just initially, but as you receive more email.</p>
<p><strong>Documents and Words</strong></p>
<p>Some words appear more frequently in spam, and therefore those words will determine whether a document is spam or not. Also, there are words that commonly show up in spam, but could also be important in an email that is not spam. <span style="color:#008000;">getwords </span>separates the document into words by splitting the stream when it encounters a character that is not a letter. This means that words with apostrophies are separated into separate words. For example, <span style="color:#008000;">they&#8217;re </span>would be come two words: <span style="color:#008000;">they </span>and <span style="color:#008000;">re</span>.</p>
<p><strong>Training the Classifier</strong></p>
<p>As we know, the more examples of documents with correct classifications a classifer sees, the better it will become at correctly classifying new documents. After adding the <span style="color:#008000;">classifier </span>class and its helper methods, I ran the code described on page 121 to check to see if it was working. Up untill now, I really didn&#8217;t understand what the class was doing with the features and categories. The example input helped me understand how this classifier will work. </p>
<p><strong>Calculating Probabilities</strong></p>
<p>The probability that a words is in a particular category will make certain words more likely to show up in spam. For example, if the word <span style="color:#008000;">&#8216;Viagra&#8217;</span> appears a lot more in the <span style="color:#008000;">&#8216;bad&#8217;</span> category than the <span style="color:#008000;">&#8216;good&#8217;</span> category, it has a high probability to be a spam word. A word like &#8216;the&#8217; is probably not a good spam classifier because it is so common:</p>
<p style="padding-left:30px;"><span style="color:#800000;">&gt;&gt;&gt;</span> cl.fprob(<span style="color:#008000;">&#8216;the&#8217;</span>,<span style="color:#008000;">&#8216;good&#8217;</span>)</p>
<p style="padding-left:30px;"><span style="color:#0000ff;">1.0</span></p>
<p style="padding-left:30px;"><span style="color:#800000;">&gt;&gt;&gt;</span> cl.fprob(<span style="color:#008000;">&#8216;the&#8217;</span>,<span style="color:#008000;">&#8216;bad&#8217;</span>)</p>
<p style="padding-left:30px;"><span style="color:#0000ff;">0.5</span></p>
<p>Because there are only five documents trained by the classifer, there are many words that only appear in one document. So, whichever category it is assigned, the other category will be 0 because it hasn&#8217;t appeared yet. This is not very reasonable, especially with frequent spam-ish words. Weighing a probability will start the probablity at 50% and then change as more occurences of the word appear.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/harkshark.wordpress.com/71/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/harkshark.wordpress.com/71/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/harkshark.wordpress.com/71/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/harkshark.wordpress.com/71/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/harkshark.wordpress.com/71/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/harkshark.wordpress.com/71/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/harkshark.wordpress.com/71/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/harkshark.wordpress.com/71/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/harkshark.wordpress.com/71/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/harkshark.wordpress.com/71/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/harkshark.wordpress.com/71/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/harkshark.wordpress.com/71/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/harkshark.wordpress.com/71/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/harkshark.wordpress.com/71/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=harkshark.wordpress.com&amp;blog=6292602&amp;post=71&amp;subd=harkshark&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://harkshark.wordpress.com/2009/04/11/portfolio-assignment-8/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/b6d96e7ed084c5af323a8b3e0f6d7436?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">harkshark</media:title>
		</media:content>
	</item>
		<item>
		<title>Portfolio Assignment 7</title>
		<link>http://harkshark.wordpress.com/2009/03/24/portfolio-assignment-7/</link>
		<comments>http://harkshark.wordpress.com/2009/03/24/portfolio-assignment-7/#comments</comments>
		<pubDate>Tue, 24 Mar 2009 18:02:28 +0000</pubDate>
		<dc:creator>Lauren</dc:creator>
				<category><![CDATA[Data Mining]]></category>

		<guid isPermaLink="false">http://harkshark.wordpress.com/?p=69</guid>
		<description><![CDATA[PCI Chapter 4 &#8211; Searching and Ranking Everyone knows the most popular search engine in the world is Google with the help of the PageRank algorithm. Chapter 6 creates a search engine by collecting documents by crawling, indexing the locatings of different words and finally ranking pages to return to a user as a list. [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=harkshark.wordpress.com&amp;blog=6292602&amp;post=69&amp;subd=harkshark&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><strong><span style="color:#ff00ff;">PCI Chapter 4 &#8211; Searching and Ranking</span></strong></p>
<p>Everyone knows the most popular search engine in the world is <a href="http://www.google.com" target="_blank">Google</a> with the help of the PageRank algorithm. Chapter 6 creates a search engine by collecting documents by crawling, indexing the locatings of different words and finally ranking pages to return to a user as a list. Google and other search engines are so fast, it&#8217;s hard to think of all the word that goes into a query. Until now, I&#8217;ve never really thought about what happens behind the scenes; I just Google it!</p>
<p><strong>A Simple Crawler</strong></p>
<p>The built-in page downloader library, <span style="color:#ff00ff;">urllib2</span>, was easy to see in action. It downloaded an HTML page and can print out characters throughout the page, at different locations and ranges. <span style="color:#ff00ff;">urllib2 </span>used in combination with <a href="http://www.crummy.com/software/BeautifulSoup/" target="_blank">BeautifulSoup</a> will parse HTML and XML documents that are poorly written. I have to download and install BeautifulSoup, but to do that, I had to download <a href="http://www.WinZip.com" target="_blank">WinZip</a> so I can extract the tar.gz file that BeautifulSoup downloads for installation. I put the BeautifulSoup.py file in my Python directory and got it working.</p>
<p>The idea of &#8220;crawling&#8221; through the Internet completely baffles me. The Internet is such a huge and ever expanding entity that sometimes I think Google is magic. However, after entering the <span style="color:#ff00ff;">crawl </span>method to <span style="color:#ff00ff;">searchengine.py</span>, I have a greater understanding of how it is done. It is a pretty cleaver algorithm! Instead of testing the <span style="color:#ff00ff;">crawl </span>method on <a href="http://kiwitobes.com/wiki/Perl.html" target="_blank">kiwitobes</a>, I used the real <a href="http://en.wikipedia.org/wiki/Perl" target="_blank">Wikipedia entry for Perl</a>:</p>
<p style="padding-left:30px;"><span style="color:#800000;">&gt;&gt;&gt;</span> pagelist = [<span style="color:#008000;">'http://en.wikipedia.org/wiki/Perl'</span>]</p>
<p style="padding-left:30px;"><span style="color:#800000;">&gt;&gt;&gt;</span> crawler = searchengine.crawler(&#8221;)</p>
<p style="padding-left:30px;"><span style="color:#800000;">&gt;&gt;&gt;</span> crawler.crawl(pagelist)</p>
<p style="padding-left:30px;"><span style="color:#0000ff;">Could not open http://en.wikipedia.org/wiki/Perl</span></p>
<p>Hmm, OK let&#8217;s try something different, like the UMW homepage. Now we&#8217;re talking!</p>
<p style="padding-left:30px;"><span style="color:#800000;">&gt;&gt;&gt;</span> pagelist = [<span style="color:#008000;">'http://www.umw.edu'</span>]</p>
<p style="padding-left:30px;"><span style="color:#800000;">&gt;&gt;&gt;</span> crawler = searchengine.crawler(&#8221;)</p>
<p style="padding-left:30px;"><span style="color:#800000;">&gt;&gt;&gt;</span> crawler.crawl(pagelist)</p>
<p style="padding-left:30px;"><span style="color:#0000ff;">Indexing http://www.umw.edu</span></p>
<p style="padding-left:30px;"><span style="color:#0000ff;">Indexing http://www.umw.edu/about/administration</span></p>
<p style="padding-left:30px;"><span style="color:#0000ff;">Indexing http://www.umw.edu/featuredfaculty/stull</span></p>
<p style="padding-left:30px;"><span style="color:#0000ff;">Indexing http://www.umw.edu/news</span></p>
<p style="padding-left:30px;"><span style="color:#0000ff;">Indexing http://www.umw.edu/news/?a=1648</span></p>
<p style="padding-left:30px;"><span style="color:#0000ff;">Indexing http://strategicplanning.umwblogs.org</span></p>
<p style="padding-left:30px;"><span style="color:#0000ff;">Indexing http://www.umw.edu/academics</span></p>
<p style="padding-left:30px;"><span style="color:#0000ff;">Indexing http://www.umw.edu/athletics</span></p>
<p style="padding-left:30px;"><span style="color:#0000ff;">Indexing http://www.umw.edu/events</span></p>
<p style="padding-left:30px;"><span style="color:#0000ff;">Indexing http://www.umw.edu/about</span></p>
<p style="padding-left:30px;"><span style="color:#0000ff;">Indexing http://umwblogs.org</span></p>
<p style="padding-left:30px;"><span style="color:#0000ff;">Indexing http://www.umw.edu/azindex</span></p>
<p>It still ends in an error&#8230; coming back to this later.</p>
<p>I installed <a href="http://oss.itsystementwicklung.de/trac/pysqlite/" target="_blank">pysqlite</a> for Python 2.6 and got it working smoothly. I like how you use python strings to execute SQL queries to manipulate the database! It took me a while to get the database schema set up because there were a lot of typos and it kept giving me errors like &#8220;table urllist already exists&#8221;. So I renamed the database and got it working. The <span style="color:#ff00ff;">crawler </span>class is complete and ready to test, if I can find a website that will work for it!</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/harkshark.wordpress.com/69/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/harkshark.wordpress.com/69/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/harkshark.wordpress.com/69/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/harkshark.wordpress.com/69/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/harkshark.wordpress.com/69/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/harkshark.wordpress.com/69/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/harkshark.wordpress.com/69/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/harkshark.wordpress.com/69/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/harkshark.wordpress.com/69/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/harkshark.wordpress.com/69/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/harkshark.wordpress.com/69/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/harkshark.wordpress.com/69/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/harkshark.wordpress.com/69/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/harkshark.wordpress.com/69/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=harkshark.wordpress.com&amp;blog=6292602&amp;post=69&amp;subd=harkshark&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://harkshark.wordpress.com/2009/03/24/portfolio-assignment-7/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/b6d96e7ed084c5af323a8b3e0f6d7436?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">harkshark</media:title>
		</media:content>
	</item>
		<item>
		<title>Portfolio Assignment 6</title>
		<link>http://harkshark.wordpress.com/2009/03/24/portfolio-assignment-6/</link>
		<comments>http://harkshark.wordpress.com/2009/03/24/portfolio-assignment-6/#comments</comments>
		<pubDate>Tue, 24 Mar 2009 17:57:09 +0000</pubDate>
		<dc:creator>Lauren</dc:creator>
				<category><![CDATA[Data Mining]]></category>

		<guid isPermaLink="false">http://harkshark.wordpress.com/?p=67</guid>
		<description><![CDATA[Clustering Movies For this assignment, my team (Andrew, Kurt, Will) and I tried to cluster a very large file with movie data. Once we got the text file to work in the readfile method, we ran it on my computer, waited, waited and waited. We knew ahead of time that it would take a while [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=harkshark.wordpress.com&amp;blog=6292602&amp;post=67&amp;subd=harkshark&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Clustering Movies</p>
<p>For this assignment, my team (Andrew, Kurt, Will) and I tried to cluster a very large file with movie data. Once we got the text file to work in the readfile method, we ran it on my computer, waited, waited and waited. We knew ahead of time that it would take a while to cluster so we ran it during class (approximately 2.5 hours) and still nothing. </p>
<p>Not knowing what to do next, I browsed my classmate&#8217;s blogs to see how they approached this movie data. The idea I tried next was to get rid of most of the column, except for two, and narrow the data down to 1,000 movies. I let it run for 5 minutes or so and finally got the Python command prompt back! But then I keep getting this error when I try to print the clusters out:</p>
<p style="padding-left:30px;">&gt;&gt;&gt; movienames,categories,data=moviecluster.readfile(&#8216;moviedata.txt&#8217;)</p>
<p style="padding-left:30px;">&gt;&gt;&gt; clust = moviecluster.hcluster(data)</p>
<p style="padding-left:30px;">&gt;&gt;&gt; moviecluster.printclust(clust,labels=movienames)</p>
<p style="padding-left:30px;">-</p>
<p style="padding-left:30px;">  Starman</p>
<p style="padding-left:30px;">Traceback (most recent call last):</p>
<p style="padding-left:30px;">  File &#8220;&lt;pyshell#14&gt;&#8221;, line 1, in &lt;module&gt;</p>
<p style="padding-left:30px;">    moviecluster.printclust(clust,labels=movienames)</p>
<p style="padding-left:30px;">  File &#8220;C:\Python26\moviecluster.py&#8221;, line 101, in printclust</p>
<p style="padding-left:30px;">    if clust.right!=None: printclust(clust.right,labels=labels,n=n+1)</p>
<p style="padding-left:30px;">  File &#8220;C:\Python26\moviecluster.py&#8221;, line 91, in printclust</p>
<p style="padding-left:30px;">    if clust.id&lt;0:</p>
<p style="padding-left:30px;">AttributeError: &#8216;list&#8217; object has no attribute &#8216;id&#8217;</p>
<p style="padding-left:30px;">&gt;&gt;&gt; </p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/harkshark.wordpress.com/67/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/harkshark.wordpress.com/67/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/harkshark.wordpress.com/67/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/harkshark.wordpress.com/67/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/harkshark.wordpress.com/67/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/harkshark.wordpress.com/67/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/harkshark.wordpress.com/67/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/harkshark.wordpress.com/67/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/harkshark.wordpress.com/67/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/harkshark.wordpress.com/67/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/harkshark.wordpress.com/67/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/harkshark.wordpress.com/67/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/harkshark.wordpress.com/67/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/harkshark.wordpress.com/67/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=harkshark.wordpress.com&amp;blog=6292602&amp;post=67&amp;subd=harkshark&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://harkshark.wordpress.com/2009/03/24/portfolio-assignment-6/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/b6d96e7ed084c5af323a8b3e0f6d7436?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">harkshark</media:title>
		</media:content>
	</item>
		<item>
		<title>Portfolio Assignment 5</title>
		<link>http://harkshark.wordpress.com/2009/03/15/portfolio-assignment-5/</link>
		<comments>http://harkshark.wordpress.com/2009/03/15/portfolio-assignment-5/#comments</comments>
		<pubDate>Sun, 15 Mar 2009 17:49:30 +0000</pubDate>
		<dc:creator>Lauren</dc:creator>
				<category><![CDATA[Data Mining]]></category>

		<guid isPermaLink="false">http://harkshark.wordpress.com/?p=62</guid>
		<description><![CDATA[Visualizations Mans Rosling&#8217;s talk at the TED conference was so cool.  The Baby Name Wizard that Professor Zacharski showed in class caught my attention. I have showed it to all my friends and typed in all of their names, which is fun. As noted in the assignment description, some visualizations are purely artistic, like this [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=harkshark.wordpress.com&amp;blog=6292602&amp;post=62&amp;subd=harkshark&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><strong>Visualizations</strong></p>
<p>Mans Rosling&#8217;s talk at the TED conference was so cool.  <a href="http://www.babynamewizard.com/voyager#prefix=&amp;ms=false&amp;sw=f&amp;exact=false" target="_blank">The Baby Name Wizard</a> that Professor Zacharski showed in class caught my attention. I have showed it to all my friends and typed in all of their names, which is fun. As noted in the assignment description, some visualizations are purely artistic, like this <a href="http://www.antarcticanimation.com/content/animation/Oceanic/oceanic.php" target="_blank">Antarctic Animation</a> which looks cool, but I have no idea what data it is trying to analyze!</p>
<p>I found an interesting <a href="http://benfry.com/salaryper/" target="_blank">baseball visualization</a> that analyzes spending and performance. Towards the end of the season in 2008, about a month before the World Series, the number one ranked team, the Angels was spending a fair amount for their performance. However, the Rays who were ranked third behind the Angels were spending considerably less than the Angels which counteracts the idea that the best teams spend the most. The good ole Washington Nats did terrible last season :( but also did not spend nearly as much as the Angels or other expensive teams like Boston and New York.</p>
<p>There are all kinds of visualizations listed on this blog by <a href="http://www.meryl.net/2008/01/175-data-and-information-visualization-examples-and-resources/" target="_blank">Meryl K. Evans</a>.</p>
<p>I created a visualization on <a href="http://manyeyes.alphaworks.ibm.com/manyeyes/" target="_blank">Many Eyes</a> by uploading a dataset for the price of gas. It only goes to 2004 so I&#8217;d like to find a more recent version. This is my practice visualization: <a href="http://manyeyes.alphaworks.ibm.com/manyeyes/visualizations/price-of-gas-1976-2004" target="_blank">Price of Gas 1976-2004</a>.</p>
<p>As a member of the UMW Women&#8217;s Soccer team, I am particularly proud of this visualization. I took all the team articles from the athletics website and combined them into one large document. I also added every member of the 2008 team. I uploaded it to Many Eyes and got a cool advertisement and summary of the 2008 season! <a href="http://manyeyes.alphaworks.ibm.com/manyeyes/visualizations/umw-womens-soccer-2008-2" target="_blank">UMW Women&#8217;s Soccer 2008 Review</a>. It&#8217;s a work in progress, I plan on editing the dataset to remove words that refer to other teams or anything else that would take away from the season and team in general. Go Eagles!</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/harkshark.wordpress.com/62/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/harkshark.wordpress.com/62/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/harkshark.wordpress.com/62/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/harkshark.wordpress.com/62/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/harkshark.wordpress.com/62/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/harkshark.wordpress.com/62/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/harkshark.wordpress.com/62/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/harkshark.wordpress.com/62/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/harkshark.wordpress.com/62/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/harkshark.wordpress.com/62/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/harkshark.wordpress.com/62/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/harkshark.wordpress.com/62/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/harkshark.wordpress.com/62/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/harkshark.wordpress.com/62/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=harkshark.wordpress.com&amp;blog=6292602&amp;post=62&amp;subd=harkshark&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://harkshark.wordpress.com/2009/03/15/portfolio-assignment-5/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/b6d96e7ed084c5af323a8b3e0f6d7436?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">harkshark</media:title>
		</media:content>
	</item>
		<item>
		<title>Portfolio Assignment 4</title>
		<link>http://harkshark.wordpress.com/2009/02/17/portfolio-assignment-4/</link>
		<comments>http://harkshark.wordpress.com/2009/02/17/portfolio-assignment-4/#comments</comments>
		<pubDate>Tue, 17 Feb 2009 00:13:46 +0000</pubDate>
		<dc:creator>Lauren</dc:creator>
				<category><![CDATA[Data Mining]]></category>

		<guid isPermaLink="false">http://harkshark.wordpress.com/?p=43</guid>
		<description><![CDATA[PCI Chapter 3 Installation déjà vu. Thank goodness my Dad loves Python! I downloaded feedparser and found myself in the same pickle as pydelicious. I called up my Dad and he said I have to add Python to the Windows path so it can recognize the &#8216;python&#8216; command.  Just in case anyone else has been having [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=harkshark.wordpress.com&amp;blog=6292602&amp;post=43&amp;subd=harkshark&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><strong>PCI Chapter 3</strong></p>
<p>Installation déjà vu. Thank goodness my Dad loves Python! I downloaded <a href="http://www.feedparser.org" target="_blank">feedparser</a> and found myself in the same pickle as pydelicious. I called up my Dad and he said I have to add Python to the Windows path so it can recognize the &#8216;<span style="color:#800080;">python</span>&#8216; command.  Just in case anyone else has been having this problem, <a href="http://www.computerhope.com/issues/ch000549.htm" target="_blank">here is a helpful site</a>. I added <span style="color:#800080;">;C:\Python;C:\Python\Scripts</span>. </p>
<p>While trying to simple set up the dataset to cluster, I got a lot of errors from copying word-for-word the code from the book. I looked up the <a href="http://oreilly.com/catalog/9780596529321/errata/9780596529321.unconfirmed" target="_blank">unofficial errata</a> for the book and there seem to be many problems with page 32! With help from Professor Zacharski, I got the code working and it provided a nice output file <span style="color:#800080;">blogdata.txt</span>.</p>
<p>The hierarchical cluster code was a little tough to follow because of all the syntax but I understand the overall purpose of <span style="color:#800080;">hcluster</span>. The <span style="color:#800080;">printclust </span>method is a neat way to output the results. I found a search engine cluster that includes: </p>
<ul>
<li>
<li>John Battelle&#8217;s Searchblog</li>
<li>The Official Google Blog</li>
<li>Search Engine Watch Blog</li>
<li>Google Operating System</li>
<li>Search Engine Roundtable</li>
<ul></ul>
</li>
</ul>
<p><span style="color:#000000;"><span style="color:#000000;">Downloading and installing</span><span style="text-decoration:none;"><span style="color:#000000;"> </span></span></span><a href="http://www.pythonware.com/downloads/index.htm" target="_blank">PIL</a> was surprisingly easy! I got the<span style="color:#333399;"> <span style="color:#800080;">test.jpg</span></span> file to work from Appendix A! I figure understanding the code used in the PIL library is not essential for this class (more essential for a graphics class) so I just got the methods to work for the clusters and it outputed a pretty jpeg file for the dendrogram for the blogs.</p>
<div id="attachment_49" class="wp-caption aligncenter" style="width: 224px"><a rel="attachment wp-att-49" href="http://harkshark.wordpress.com/2009/02/17/portfolio-assignment-4/blogclust/"><img class="size-medium wp-image-49 " title="Blog Clusters" src="http://harkshark.files.wordpress.com/2009/02/blogclust.jpg?w=214&#038;h=300" alt="Dendrogram generated by clusters.drawdendrogram" width="214" height="300" /></a><p class="wp-caption-text">Dendrogram generated by clusters.drawdendrogram</p></div>
<p> Switching the columns and rows in <span style="color:#800080;">blogdata.txt</span> and creating a new dendrogram showing word clusters took a long time to generate because there are more words than blogs. As I&#8217;m waiting for the jpeg image to render, I&#8217;m sure the picture will be huge. Viewing the image at 3% gets the entire image into the window. I&#8217;m not going to upload this picture but I&#8217;ll crop out an interesting word cluster I found:</p>
<div id="attachment_54" class="wp-caption aligncenter" style="width: 138px"><a rel="attachment wp-att-54" href="http://harkshark.wordpress.com/2009/02/17/portfolio-assignment-4/wordclustcrop/"><img class="size-full wp-image-54" title="Word Cluster" src="http://harkshark.files.wordpress.com/2009/02/wordclustcrop.jpg?w=128&#038;h=238" alt="A cluster from the dendrogram showing word clusters" width="128" height="238" /></a><p class="wp-caption-text">A cluster from the dendrogram showing word clusters</p></div>
<p>I like the idea of k-means clustering better than hierarchical clustering because you can define the number of clusters to form beforehand. This is more ideal in the real world because it allows you to form groups based on the results you would like to expect.  I played around with changing the number of centroids. First I did k=10, like the book. Then I tried k=14 and k=4. At first I thought the clusters would contain an equal amount of entries but that wasn&#8217;t the case:</p>
<p style="padding-left:30px;"><span style="color:#800000;">&gt;&gt;&gt;</span> [blognames[r] <span style="color:#ff6600;">for </span>r <span style="color:#ff6600;">in </span>kclust[0]]</p>
<p style="padding-left:30px;"><span style="color:#0000ff;">["John Battelle's Searchblog", 'Giga Omni Media, Inc.', 'Google Operating System', 'Gawker: Valleywag', 'Gizmodo', 'Lifehacker', 'Slashdot', 'Search Engine Watch Blog', 'Schneier on Security', 'Search Engine Roundtable', 'TechCrunch', 'mezzoblue', 'Matt Cutts: Gadgets, Google, and SEO', 'The Official Google Blog', 'Bloglines | News', 'Quick Online Tips']</span></p>
<p style="padding-left:30px;"><span style="color:#800000;">&gt;&gt;&gt;</span> [blognames[r] <span style="color:#ff6600;">for </span>r <span style="color:#ff6600;">in </span>kclust[3]]</p>
<p style="padding-left:30px;"><span style="color:#0000ff;">['Joystiq', 'Download Squad', 'Engadget', 'Crooks and Liars', "SpikedHumor - Today's Videos and Pictures", 'The Unofficial Apple Weblog (TUAW)', 'Wired Top Stories']</span></p>
<p>I&#8217;m not familiar with most of the blogs, but in cluster 0, there are a lot of search engine blogs which is similar to the heirarchical clustering. Below is a cluster from the Zebo dataset that I found was interesting. It combines the desire to have human interaction as well as material wealth. </p>
<div id="attachment_57" class="wp-caption aligncenter" style="width: 310px"><a rel="attachment wp-att-57" href="http://harkshark.wordpress.com/2009/02/17/portfolio-assignment-4/clusterscrop/"><img class="size-medium wp-image-57" title="Zebo Preferences Cluster" src="http://harkshark.files.wordpress.com/2009/02/clusterscrop.jpg?w=300&#038;h=247" alt="Clusters of things that people want" width="300" height="247" /></a><p class="wp-caption-text">Clusters of things that people want</p></div>
<p>Of all the data mining techniques we have studied so far, I think clustering is the most useful tool in the real world. There are a lot of datasets that could be clustered to find useful information for marketing as well as for fun.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/harkshark.wordpress.com/43/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/harkshark.wordpress.com/43/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/harkshark.wordpress.com/43/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/harkshark.wordpress.com/43/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/harkshark.wordpress.com/43/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/harkshark.wordpress.com/43/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/harkshark.wordpress.com/43/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/harkshark.wordpress.com/43/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/harkshark.wordpress.com/43/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/harkshark.wordpress.com/43/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/harkshark.wordpress.com/43/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/harkshark.wordpress.com/43/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/harkshark.wordpress.com/43/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/harkshark.wordpress.com/43/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=harkshark.wordpress.com&amp;blog=6292602&amp;post=43&amp;subd=harkshark&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://harkshark.wordpress.com/2009/02/17/portfolio-assignment-4/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/b6d96e7ed084c5af323a8b3e0f6d7436?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">harkshark</media:title>
		</media:content>

		<media:content url="http://harkshark.files.wordpress.com/2009/02/blogclust.jpg?w=214" medium="image">
			<media:title type="html">Blog Clusters</media:title>
		</media:content>

		<media:content url="http://harkshark.files.wordpress.com/2009/02/wordclustcrop.jpg" medium="image">
			<media:title type="html">Word Cluster</media:title>
		</media:content>

		<media:content url="http://harkshark.files.wordpress.com/2009/02/clusterscrop.jpg?w=300" medium="image">
			<media:title type="html">Zebo Preferences Cluster</media:title>
		</media:content>
	</item>
		<item>
		<title>Portfolio Assignment 3</title>
		<link>http://harkshark.wordpress.com/2009/02/10/portfolio-assignment-3/</link>
		<comments>http://harkshark.wordpress.com/2009/02/10/portfolio-assignment-3/#comments</comments>
		<pubDate>Tue, 10 Feb 2009 01:42:28 +0000</pubDate>
		<dc:creator>Lauren</dc:creator>
				<category><![CDATA[Data Mining]]></category>

		<guid isPermaLink="false">http://harkshark.wordpress.com/?p=27</guid>
		<description><![CDATA[I worked on this assignment with my Team: Andrew Nelson, Kurt Koller and Will Boyd. This first thing we did was to download the Last.fm API. Kurt signed up on Last.fm to register to use the API. We decided to use Python for the project so we downloaded and installed pylast from Google Code. Copying [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=harkshark.wordpress.com&amp;blog=6292602&amp;post=27&amp;subd=harkshark&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><span style="color:#000000;">I worked on this assignment with my Team: Andrew Nelson, Kurt Koller and Will Boyd.</span></p>
<p>This first thing we did was to download the <a href="http://www.last.fm/api" target="_blank">Last.fm API</a>. Kurt signed up on Last.fm to register to use the API. We decided to use Python for the project so we downloaded and installed pylast from Google Code. Copying the files into the Python directory and then using the command <span style="color:#ff6600;">import pylast</span> in IDLE gave us no errors so the install was successful.</p>
<p>To begin our recommendation system, we started simple with getting similar artists to a given artist:</p>
<p style="padding-left:30px;"><span style="color:#ff6600;">&gt;&gt;&gt; artist = pylast.Artist(&#8220;The Black Keys&#8221;,&#8221;bd46f9bce716e11a6d311d77c06d2159&#8243;,&#8221;a313f7a6a587763c71eeb3cac498ca&#8221;,&#8221;)</span></p>
<p style="padding-left:30px;"><span style="color:#ff6600;">&gt;&gt;&gt; artist.get_similar()</span></p>
<p>and got a large list of similar artists! Success!</p>
<p>Right now we are debating whether to use Python as a command-line interface or to switch to PhP to make a simple GUI application. Will has the most experience in PhP so he has taken the lead on trying to get a simple GUI up and running. Of course we will now have to download and install the last.fm API for PhP. While Will works on the PhP version, Kurt, Andrew and I are working on the Python command-line.</p>
<p>To create our system, we&#8217;d like to create a menu with a list of options the user can choose:</p>
<ol>
<li>Input artist, output list of similar artists: <span style="color:#ff6600;">artist.get_similar()</span></li>
<li>Input artist, output top tracks: <span style="color:#ff6600;">artist.get_top_tracks()</span></li>
<li><span style="color:#ff6600;"><span style="color:#000000;">Input track, output list of similar songs: <span style="color:#ff6600;">track.getSimilar()</span></span></span></li>
</ol>
<p>So, we started our basic command-line menu system. Should be pretty simple right, especially with Python. First issue we ran into was our if-statement. Prompt the user for a menu item, and then go into that if-block. We created a simple test to see if we could get option 1 from above to work. User typed in a 1 and it wouldn&#8217;t go into the corresponding if-block! WHY! We did some research online and our syntax was correct. We figured out that the<span style="color:#ff6600;"> raw_input()</span> method returns a String and we were testing the value of 1 as an integer! Problem solved, we got option 1 to work!</p>
<p>Option 2 is not working as smoothly as the first. We keep getting errors for the method <span style="color:#ff6600;">get_top_tracks()<span style="color:#000000;">.  Let me just say that the last.fm API is terrible! The syntax is wrong and the parameter values are off. The API for <span style="color:#ff6600;">get_top_tracks()</span> says it takes two arguments but when we input them into our program, we get a syntax error saying it only takes one parameter.</span></span></p>
<p><span style="color:#ff6600;"><span style="color:#000000;">Code so far:</span></span></p>
<pre style="padding-left:30px;"><span style="color:#ff6600;">import pylast

print "Welcome to the Team 3 Pylast Recommender!"
print "To find an artist similar to your artist, press (1)"
print "To find a list of top tracks by an artist, press (2)"
print "To find a list of songs similar to a particular song, press (3)"
print "To quit, press (4)"

input = raw_input("&gt;")
print input

if (input=="1"):
    print "Please enter the name of an Artist"
    similarArtistName = raw_input("&gt;")
    similarArtist = pylast.Artist("similarArtistName", "bd46f9bce716e11a6d311d77c06d2159", "a313f7a6a587763c71eeb3cac498ca40", '')
    print similarArtist.get_similar()
elif (input=="2"):
    print "Please enter the name of an Artist"
    trackArtistName = raw_input("&gt;")
    trackArtist = pylast.Artist("trackArtistName", "bd46f9bce716e11a6d311d77c06d2159", "a313f7a6a587763c71eeb3cac498ca40", '')
    print pylast.Artist.get_top_tracks(trackArtist)</span></pre>
<p><span style="color:#000000;">Will demo&#8217;ed a great PhP recommendation system in class. He used these simple last.fm API methods:  <span style="color:#ff6600;">Artist.get_similar()</span>, <span style="color:#ff6600;">Track.get_similar()</span>, <span style="color:#ff6600;">Geo.get_top_artist()</span> and <span style="color:#ff6600;">Geo.get_top_tracks()</span>. Check it out <a href="http://rosemary.umw.edu/~wboyd/datamining/portfolio3/" target="_blank">here</a>!</span></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/harkshark.wordpress.com/27/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/harkshark.wordpress.com/27/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/harkshark.wordpress.com/27/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/harkshark.wordpress.com/27/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/harkshark.wordpress.com/27/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/harkshark.wordpress.com/27/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/harkshark.wordpress.com/27/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/harkshark.wordpress.com/27/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/harkshark.wordpress.com/27/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/harkshark.wordpress.com/27/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/harkshark.wordpress.com/27/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/harkshark.wordpress.com/27/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/harkshark.wordpress.com/27/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/harkshark.wordpress.com/27/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=harkshark.wordpress.com&amp;blog=6292602&amp;post=27&amp;subd=harkshark&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://harkshark.wordpress.com/2009/02/10/portfolio-assignment-3/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/b6d96e7ed084c5af323a8b3e0f6d7436?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">harkshark</media:title>
		</media:content>
	</item>
		<item>
		<title>Portfolio Assignment 2</title>
		<link>http://harkshark.wordpress.com/2009/02/04/portfolio-assignment-2/</link>
		<comments>http://harkshark.wordpress.com/2009/02/04/portfolio-assignment-2/#comments</comments>
		<pubDate>Wed, 04 Feb 2009 07:57:35 +0000</pubDate>
		<dc:creator>Lauren</dc:creator>
				<category><![CDATA[Data Mining]]></category>

		<guid isPermaLink="false">http://harkshark.wordpress.com/?p=21</guid>
		<description><![CDATA[Python I have had a very difficult time trying to get pydelicious to work. I&#8217;m using a Windows XP machine and have done multiple Google searches on &#8220;how to install pydelicious&#8221;. I find every entry confusing and not very helpful. I&#8217;ve even looked at blog entries from classmates and couldn&#8217;t follow their installations, whether a [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=harkshark.wordpress.com&amp;blog=6292602&amp;post=21&amp;subd=harkshark&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><strong>Python</strong></p>
<p>I have had a very difficult time trying to get pydelicious to work. I&#8217;m using a Windows XP machine and have done multiple Google searches on &#8220;how to install pydelicious&#8221;. I find every entry confusing and not very helpful. I&#8217;ve even looked at blog entries from classmates and couldn&#8217;t follow their installations, whether a different operating system or not. I&#8217;m going to get together with a member of my team to work through it.</p>
<p><strong>Weka</strong> Part 1</p>
<p>Although the sample dataset is included when Weka is downloaded, I wanted to go through the process of creating an ARFF file from Microsoft Excel. I created the weather dataset in Excel and saved it as a CSV file. Then I opened it in Notepad to add the ARFF file tags: <em>@relation</em>, <em>@attribute</em> and <em>@data</em>. </p>
<p><strong>Weka</strong> Part 2</p>
<p>After downloading the Cleveland Heart Disease dataset, I used the preprocess tab in the Weka Explorer to load the ARFF file. I ran the J48 decision tree from the classify tab and got this performance summary:</p>
<p style="padding-left:30px;"><span style="color:#33cccc;">=== Summary ===</span></p>
<p style="padding-left:30px;"><span style="color:#33cccc;">Correctly Classified Instances         235               77.5578 %</span></p>
<p style="padding-left:30px;"><span style="color:#33cccc;">Incorrectly Classified Instances        68               22.4422 %</span></p>
<p style="padding-left:30px;"><span style="color:#33cccc;">Kappa statistic                          0.5443</span></p>
<p style="padding-left:30px;"><span style="color:#33cccc;">Mean absolute error                      0.1044</span></p>
<p style="padding-left:30px;"><span style="color:#33cccc;">Root mean squared error                  0.2725</span></p>
<p style="padding-left:30px;"><span style="color:#33cccc;">Relative absolute error                 52.0476 %</span></p>
<p style="padding-left:30px;"><span style="color:#33cccc;">Root relative squared error             86.5075 %</span></p>
<p style="padding-left:30px;"><span style="color:#3366ff;"><span style="color:#33cccc;">Total Number of Instances              303    </span> </span></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/harkshark.wordpress.com/21/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/harkshark.wordpress.com/21/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/harkshark.wordpress.com/21/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/harkshark.wordpress.com/21/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/harkshark.wordpress.com/21/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/harkshark.wordpress.com/21/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/harkshark.wordpress.com/21/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/harkshark.wordpress.com/21/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/harkshark.wordpress.com/21/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/harkshark.wordpress.com/21/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/harkshark.wordpress.com/21/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/harkshark.wordpress.com/21/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/harkshark.wordpress.com/21/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/harkshark.wordpress.com/21/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=harkshark.wordpress.com&amp;blog=6292602&amp;post=21&amp;subd=harkshark&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://harkshark.wordpress.com/2009/02/04/portfolio-assignment-2/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/b6d96e7ed084c5af323a8b3e0f6d7436?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">harkshark</media:title>
		</media:content>
	</item>
		<item>
		<title>Portfolio Assignment 1</title>
		<link>http://harkshark.wordpress.com/2009/01/25/portfolio-assignment-1/</link>
		<comments>http://harkshark.wordpress.com/2009/01/25/portfolio-assignment-1/#comments</comments>
		<pubDate>Sun, 25 Jan 2009 00:44:35 +0000</pubDate>
		<dc:creator>Lauren</dc:creator>
				<category><![CDATA[Data Mining]]></category>

		<guid isPermaLink="false">http://harkshark.wordpress.com/?p=6</guid>
		<description><![CDATA[After creating recommendations.py and running the commands on page 9 of &#8220;Collective Intelligence&#8221;, I got an error about recommendations not existing. I then re-read the page and moved recommendations.py to the Lib directory in Python. That fixed it right away. I love how easy Python makes it to use data structures like dictionaries and lists! Euclidean Distance Plugging [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=harkshark.wordpress.com&amp;blog=6292602&amp;post=6&amp;subd=harkshark&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>After creating <em>recommendations.py</em> and running the commands on page 9 of &#8220;Collective Intelligence&#8221;, I got an error about <em>recommendations</em> not existing. I then re-read the page and moved <em>recommendations.py</em> to the Lib directory in Python. That fixed it right away. I love how easy Python makes it to use data structures like dictionaries and lists!</p>
<p><strong>Euclidean Distance</strong></p>
<p>Plugging in the Euclidean distance right into the Python interpreter (using IDLE) gave me the same answers as the example in the book with Toby and LaSalle. However, when I added the function <em>sim_distance</em> to <em>recommendations.py</em> I got a different answer for Lisa Rose and Gene Seymour. I added the squares of the differences by hand and got the same answer as my function. I think the general consensus is the book is wrong!</p>
<p><strong>Pearson Coefficient</strong></p>
<p>The Pearson coefficient worked correctly and yielded the same results as the book. It took me a while to understand how the function <em>sim_pearson</em> was operating like the formula we discussed in class but I worked through it.</p>
<p><strong>Manhattan Distance</strong></p>
<p>Implementing the Manhattan distance was pretty simple. I followed the same format as the <em>sim_distance</em> and <em>sim_pearson</em> functions. The formula for the Manhattan distance is |X1-X2|+|Y1-Y2|+&#8230;+|Z1-Z2|. I had to look up the syntax for an absolute value function in Python and it was what I thought it would be: <em>abs(x)</em>. Below is my <em>sim_manhattan</em> function.</p>
<p style="padding-left:30px;"><span style="color:#800080;">from math import sqrt</span></p>
<p style="padding-left:30px;"><span style="color:#800080;"># Returns a distance-based similarity score for personA and personB</span></p>
<p style="padding-left:30px;"><span style="color:#800080;">def sim_manhattan(prefs, personA, personB):</span></p>
<p style="padding-left:30px;"><span style="color:#800080;">    # Get the list of shared_items</span></p>
<p style="padding-left:30px;"><span style="color:#800080;">    si={}</span></p>
<p style="padding-left:30px;"><span style="color:#800080;">    for item in prefs[personA]:</span></p>
<p style="padding-left:30px;"><span style="color:#800080;">        if item in prefs[personB]:</span></p>
<p style="padding-left:30px;"><span style="color:#800080;">            si[item]=1</span></p>
<p style="padding-left:30px;"><span style="color:#800080;">    # if they have no ratings in common, return 0</span></p>
<p style="padding-left:30px;"><span style="color:#800080;">    if len(si)==0: return 0</span></p>
<p style="padding-left:30px;"><span style="color:#800080;">    # Add up the absolute values of all the differences</span></p>
<p style="padding-left:30px;"><span style="color:#800080;">    sum_of_abs=sum([abs(prefs[personA][item]-prefs[personB][item])  for item in si])</span></p>
<p style="padding-left:30px;"><span style="color:#800080;">    return sum_of_abs</span></p>
<p>When tested in the Python interpretor with the critics Lisa Rose and Gene Seymour, I got the following, correct result:</p>
<p style="padding-left:30px;"><span style="color:#800080;">&gt;&gt;&gt; reload(recommendations)</span></p>
<p style="padding-left:30px;"><span style="color:#800080;">&lt;module &#8216;recommendations&#8217; from &#8216;C:\Python26\lib\recommendations.py&#8217;&gt;</span></p>
<p style="padding-left:30px;"><span style="color:#800080;">&gt;&gt;&gt;recommendations.sim_manhattan(recommendations.critics,&#8217;Lisa Rose&#8217;, &#8216;Gene Seymour&#8217;)</span></p>
<p style="padding-left:30px;"><span style="color:#800080;">4.5</span></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/harkshark.wordpress.com/6/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/harkshark.wordpress.com/6/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/harkshark.wordpress.com/6/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/harkshark.wordpress.com/6/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/harkshark.wordpress.com/6/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/harkshark.wordpress.com/6/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/harkshark.wordpress.com/6/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/harkshark.wordpress.com/6/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/harkshark.wordpress.com/6/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/harkshark.wordpress.com/6/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/harkshark.wordpress.com/6/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/harkshark.wordpress.com/6/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/harkshark.wordpress.com/6/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/harkshark.wordpress.com/6/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=harkshark.wordpress.com&amp;blog=6292602&amp;post=6&amp;subd=harkshark&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://harkshark.wordpress.com/2009/01/25/portfolio-assignment-1/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/b6d96e7ed084c5af323a8b3e0f6d7436?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">harkshark</media:title>
		</media:content>
	</item>
	</channel>
</rss>
