Lauren's Blog

stop and smell the roses

Disable Balloon Tips July 29, 2009

Filed under: Windows XP — Lauren @ 3:51 pm

I hate clutter in my System Tray. I hate when software runs when Windows starts and adds another icon. I especially hate the annoying information balloons that pop up all the time! Why do I need a balloon notifying me that “a network cable has been disconnected” when the icon changes to a computer with a big, red X on it? Or the “Found new hardware” balloon every time you plug in your flash drive? I would be OK with them if they eventually went away like the Outlook Desktop Alert notifications, but you have to click the balloons to make them disappear! To much clicking!

I’m all about customization so I wanted to get rid of them asap. I checked all the Windows Display Settings and Taskbar Settings and didn’t find anything for balloon tips so I Googled. Here are my Google Search Results: “get rid of windows balloons”.

The first two hits gave me what I wanted. Since this solution involves editing the Windows Registry, I checked a couple more articles to make sure the solutions were similar. I don’t like to mess with the operating system but I had to get rid of those balloons! The PCMAG article is the one I followed initially:

Get Rid of Those Pesky Balloons!

  1. From the Start button select Run (Windows Logo + R)
  2. Type regedit and hit Enter to open the Registry Editor
  3. Go to HKEY_CURRENT_USER → software → microsoft → windows → currentversion → explorer → advanced
  4. Under Edit select New → DWORD Value
  5. Type EnableBalloonTips and hit Enter
  6. Close the Registry Editor and Log Out/Log In again to enable the change

At first I was confused why I was typing EnableBalloonTips when I want to Disable them but reading the WindowsNetworking article I learned that assigning the value to 0 would disable it. To enable the balloon tips set the value to 1.

After I logged back in, I tested it out by unplugging my Ethernet cable. No balloon! Thanks PCMag :)

 

Portfolio Assignment 9 April 23, 2009

Filed under: Data Mining — Lauren @ 4:42 am

Final Project

For our final project, my team (Andrew, Kurt, Will) and I will be expanding on our work from last week with document filtering.

The Problem

As you well know, spam is a very annoying and persistant presence on the Internet. In chapter 6 of PCI, we learned that rule-based classifier don’t cut it because spammers are getting smarter. So, we created a learning classifier that is trained on data and gives a document a category depending on word or feature probabilities. The only guidelines for our project was to use a substantial dataset. The algorithm in the book uses strings as “documents”. We want to use real email documents to train the classifier and use it for future classifications.

The Data

At first, we searched the Internet for some fun spam datasets to download. Of course, there were a ton! But the way we planned to modify the classifying algorithm in the book was to use email text and we kept finding weird formats for the datasets. So, Will logged into his old Yahoo email account and found 1,400 spam emails. I’m pretty sure if I logged into my old AOL account I would find a similar number! At first we thought we were going to have to use the sampletrain method from the book and type the name of every file into a line of code. That would take forever and make the algorithm not very realistic in real life. Will whipped up a function to rename all of his emails into a format of either spam#.txt or nonspam#.txt:

def openfiles(cl):

    data = open(‘blogsplogreal.txt’, ‘r’)

    lines = data.readlines();

    for i in lines:

        thisline = i.split(” “);

        filename = thisline[1];

        print ‘opening: ‘ + filename;

        if thisline[2] == “1\n”:

            spamtype = ‘spam’;

        else:

            spamtype = ‘not-spam’;

        print ‘file type: ‘ + spamtype;

        cl.train(filename, spamtype);

This was useful because we created a loop to train the classifier by concatenating the basename of the file (spam or nonspam), the number and ‘.txt’:

def sampletrain(cl,basefile,numfiles,gory):

    for i in range(1,numfiles+1):

        filename =  basefile + str(i) + ‘.txt’

        #print filename

        cl.train(filename,gory)

The Solution

We used Bayes (when in doubt, use Bayes!) to train and classify documents. Starting off the books code, we had to edit the getwords method to open a file and add the words to the dictionary. A friendly neighbor in the lab showed us how to do file I/O and this is what we came up with:

 

def getwords(doc):

    data = open(doc, ‘r’)

    lines = ‘ ‘

    for line in data:

        lines+=line

    #print lines

    splitter = re.compile(‘\\W*’)

    # Split the words by non-alpha characters

    words = [s.lower() for s in splitter.split(lines) if len(s) > 2 and len(s) < 20]

    #print words

    # Return the unique set of words only

    return dict([(w,1) for w in words])

This opens a file and concatenates each line into one big string. Then it is split up and converted to lowercase as the book does. Now, we can send the classifier a filename and it will get the features and resume with the same algorithm.

The Results

We were very excited to see that our modifications to allow files to be trained compiled! It took some serious looking at the code to make sure we were doing it correctly, but now we really understand what is going on. Starting small, we trained two documents, one that was spam and one that was not. It correctly added the categories and features to the dictionary- success number 1! Then we trained a few more documents and gave it a unknown document to classify and it worked! It classified 100% of 4 document correctly. Now that we know it works, we ran the algorithm with Will’s mixed spam and nonspam files. Tomorrow we’re going to run it with a combination of unknown document and see how it classifies in front of the class.

 

 
Follow

Get every new post delivered to your Inbox.