Enter the maze

A boost for learning

Spam

One of the most interesting applications of artificial intelligence is machine learning. As humans we start off not being very good at things, but over time we get better, we learn. But can we give computers the ability to learn too?

It's all down to experience

One of the ways people learn is through examples. We learn through experience, for example, which foods we like the taste of and which type of music we like or don't. As children, our parents will help us learn, say, the difference between a chair and a table, by pointing out when we make a mistake. It's this ability to take a whole range of information and to be able to group them into different types (or classes) that forms the basis of our learning. This is called supervised learning because someone is helping you work out the differences by pointing out your mistakes. So how can a computer system learn? One way is to use a process called 'boosting'. Boosting is like having a group of not very good experts. Each is good at doing some part of the task, but if you add them all together to form a committee. If you go about it the right way, you can then meld them into a team that's really good at the job.

You have mail?

Let's take for example the very useful task of learning how to tell the difference between genuine email and Spam email. Spam email are all those unwanted emails that arrive in your inbox telling you things you don't want to know or selling you things you don't want to buy. First you need to store two sets of emails, genuine emails and Spam emails, so you can help your system learn the difference between them. The more emails in the sets the better: it gives more examples for the system to learn from. Then you throw in your 'experts'.

Have a go at it

To begin with you let the computer randomly choose a series of arbitrary rules: 'rules of thumb' that look for things like patterns of words or letters in the two sets of emails. For example it would normally be a good prediction of Spam if the words 'you have won' appeared in the email. You have lots of different versions of these rules, though, some good like that one ome bad, so it's like randomly picking lots of people off the street and treating each as an expert – they are most likely to be poor experts. Each 'thinks' it has a way of telling Spam from genuine emails, but how can we find out if any of them are any good? We can calculate how well each of these rules works by looking at their score; we know how each email should be classified because we classified them ourselves, so we can supervise the learning and point out any mistakes. We can set it so that for example if the rule thinks an email is genuine it outputs a 1 if it's Spam it gives a –1. We can now work out how many of the Spam emails each rule correctly classifies as Spam, and how many genuine emails it said were genuine. Just as important though is how the rules score on getting it wrong. How many Spam emails did it say were genuine and vice versa? Our scoring combines the number of emails it got right and the number it got wrong to give an overall score for each of the rules. These are called our 'weak classifiers'.

Weak classifiers are useful but not brilliant. The rules with high scores are better at doing the task than those with low scores, but many are still pretty poor. After all we just set them up at random. We want to use our weak classifiers to build a system that's really up for the job. We call that a 'strong classifier'. It's going to be able to do the job with any accuracy we care to build it for, but how do we go about creating it?

Let's get together and make things work

We take the best of these weak classifiers to start to build our team, our strong classifier that will eventually give us the best prediction of whether an email is Spam or not. Having added in the really good weak classifiers to start building our strong classifier, we need to make sure that they are going to be paid attention to. To allow this we multiply the output of these good rules by a large number. That gives them a weighting, which is typically related to the rule's accuracy, so good rules produce bigger numbers. When we run the final system, the rules that have bigger weightings have more 'voting power' in the outcome if their rules are met. It's a good start but we can do better.

Our original rules weren't perfect. They still got some of the examples wrong, so we go back to the data, the two sets of Spam and non Spam emails, and start to change it. Those email examples that were already correctly classified aren't the problem as we can pick those out with the rules we already have. We therefore take these emails out of the data sets, as we can already 'do' them. The examples we got wrong are the important ones. That's where we need to do the work, refine our system's expertise, so we concentrate on them.

Let's do it again

We run our remaining weak classifier experts' rules over this new data set and again find the rules that give the smallest error. The clever part is that we 'promote' (add) these rules into our strong classifier, making sure we give them a weighting that's related to how good they are. We can then go back to the data again, look at the examples that were correctly classified, take them out and look at the remaining problem cases and run the whole process again ... and again and again if we wish. Eventually we will have built a whole set of weighted weak classifier rules into a strong classifier that gives us the expertise - the accuracy - we want.

Votes please, the committee will decide

What we end up with is a strong classifier that's filled with experts (rules) each of which is weighted by how good it is. It's like having a voting committee of experts where some of the experts have more votes than others. When we get a new email that we haven't classified before ourselves and want to decide if it's Spam or not, we send it to our committee of experts. That just means we apply all the rules in the strong classifier and see how the votes tally. What did our really good experts (rules with high weightings) think it was? It may be that that all the top experts missed it, and it's up to the votes of a few of the less good experts (the rules with lower weightings) to combine their voting power to make the correct decision. Whatever way the committee comes to a decision that decision should be correct (up to the accuracy we decided we wanted in setting our committee up). After all we have all our experts on hand to make sure of that. Using boosting lets us produce an accurate prediction rule by combining together lots of less accurate rules. It's all about working together, and a lesson in mathematical democracy, which is a lesson we can all hopefully learn from.