In this tutorial we will implement the Naive Categorical Bayes Classifier. Previously we looked at how we can read data from a text file. Here we will use this data to classify items.

### Approach

We want to determine which class is more likely to have the given features. For that, we will use the Bayes Equation:

$P(Class|Features) = \frac{P(Class)*P(Features|Class)}{P(Features)}$

We will also assume that the features are independent (hence the Naive in the name) so the above is simplified to:

$P(Class|Features) = \frac{P(Class)*P(Feature_{1}|Class)*P(Feature_{2}|Class)*...*P(Feature_{n}|Class)}{P(Feature_{1})*P(Feature_{2})*...*P(Feature_{n})}$

The higher this number the better. So, we will run this equation for every class, inputing the given features and we will pick the highest.

### Data

We will use the code we wrote here to read the data. Put said code into a Python script named _DataReader and import it into your classifier script.

Then, call the Read function of DataReader and get the dictionaries. Assume your data set file is named data.txt. In code:

Classes = data[0];
Features = data[1];
P = data[2];

The data set we will use can be found here. We have two classes, Detective and Brute, and three features, Tall, Slim and Smart. Each line represents an item. The first word of a line is the class the item is classified in. True means that the item has the feature on that column, while False means it doesn’t.

Say we have a new item that we know has the following features, Tall and Slim. We want to classify it into one of the two classes.

We will run it through the classifier function. We will name that function Classifier and it will take as input a list of features (called Evidence). To call it we write:

Classifier(['Tall','Slim']); #The item features

Note: The data set has another feature, Smart, but we didn’t use it here. It doesn’t mean that the item doesn’t have the feature, we just don’t know.

### Code

The main function that will categorize/classify items is called Classifier and it has a parameter (called Evidence) to receive a list of features.

First we want to parse the list into a string (called evidence), with features separated by a comma. For example, if we receive the list [‘Tall’,’Slim’] we want to parse it as “Tall, Slim”. That way we can save the classification for future use in an intuitive way. In Statistics we write P(Detective|Tall, Slim). Using a dictionary, we will write it as P[“Detective|Tall, Slim”], where “Tall, Slim” will come from the variable evidence and “Detective” from the iteration of the classes.

We also want to check if the given features are actually part of the feature list, Features. We will check that and we will build the string at the same time. Evidence is the input list of features.

#The string of evidence, so that we can save it in P.
evidence = '';

#Check if all evidence is also in Features
for e in Evidence:
if e not in Features:
#A given evidence does not belong in Features. Abort.
print "Evidence list is erroneous"
return;

#Build the evidence string
evidence += e + ', ';

#remove the last two chars, as they are ', '
evidence = evidence[:-2];

Now we are ready to delve into the core of the algorithm. The Bayes Equation. We will run the equation for every class c in Classes, saving the value in P(c|evidence):

$P(c|evidence) = \frac{P(c)*P(evidence_{1}|c)*P(evidence_{2}|c)*...*P(evidence_{n}|c)}{P(evidence_{1})*P(evidence_{2})*...*P(evidence_{n})}$

We will start with the a priori probability of the class, and we will multiply that with the conditional feature/class probability and divide it with the probability of the feature.

Then we will check if the current class has a higher probability of appearing than the current max, and if yes we will save it. To implement this “Find Max” procedure, we will use the temporary variables m (to hold the max) and classification (to hold the classification).

In code:

m = -1.0; #Hold the max
classification = ''; #Hold the classification

#We need to find P(c|evidence). The equation (from Bayes) is:
#P(c|evidence)=P(evidence1|c)*P(evidence2|c)*...*P(evidenceN|c)
#*P(c) divided by P(evidence1)*P(evidence2)*...*P(evidenceN)

#Calculate the probability of all classes for given
#evidence/features using the Bayes equation.
#Pick the highest.
for c in Classes:
#Start from the prior probability
P[c + '|' + evidence] = P[c];

for e in Evidence:
#Multiply by the conditional prob and divide
#by the feature prob
P[c + '|' + evidence] *= P[e + '|' + c] / P[e];

#Find the max
if(P[c + '|' + evidence] > m):
#P(c|evidence) is the max so far;
#update m and classification
m = P[c + '|' + evidence];
classification = c;

Finally, we will return the classification and the max:

#With the evidence, the item belongs to classification
#with a prob of m
print classification, m;

If you run the above code, you should receive this output:

Detective 0.47619047619

### Conclusion

And with that we have developed a Naive Categorical Bayes Classifier from the ground up. You can find the full code on my Github.

Links to the rest of the series:

Part 1

Part 3