In a previous post we looked at the theory behind a Perceptron. In this tutorial we will implement this algorithm using Python.

Note: We will implement the algorithm from scratch, so that we get a better understanding of how the algorithm actually works. If you want to use it to solve a problem, it’s better if you used an external library that already implements the algorithm (like scikit-learn for Python).

First we need to read the data from a file. Each line in the file represents an item, and the first line holds the feature names (+ the keyword “Class” at the end). The file follows this format:

Height,Weight,Age,Class
1.70,65,20,Programmer
1.90,85,33,Builder
1.78,76,31,Builder
1.73,74,24,Programmer
1.81,75,35,Builder
1.73,70,75,Scientist
1.80,71,63,Scientist
1.75,69,25,Programmer


You can find an example data file here.

Let’s jump right into the implementation. We need to get the number of features, which can be found on the first line.

featuresNumber = len(lines[0].split(','));


We also need to store the item information and the different classes that appear in the data set. For that, we will use three lists, items, classes and features. The second and third ones will hold the names of the classes and features respectively. The first one is a bit more complicated though. It will hold dictionaries, with each dictionary representing an item. The keys to these dictionaries are the feature names and will hold the values the item has for the feature/key, “Class” to hold the class the item is categorized in, and “Bias” which will hold a value of 1. With Bias we are essentially augmenting the item/feature vector with a value of 1.

For each line in the file, we will split it to get the features, storing them into a temporary array before adding them to the items list. At the end, we will shuffle the list to make sure the data set is properly randomized.

items = [];
classes = [];
features = lines[0].split(',')[:-1];

for i in range(1,len(lines)):
line = lines[i].split(',');

if(line[-1] not in classes):
classes.append(line[-1]);

itemFeatures = {"Class" : line[-1], "Bias" : 1};

for j in range(len(features)):
#Iterate through the features
f = features[j]; #Get the feature at index j
#The first item in the line is the class, skip it
v = float(line[j]); #Convert value to integer

itemFeatures[f] = v; #Add feature to list

items.append(itemFeatures); #Append temp dict to items

shuffle(items);


### Auxiliary Functions

First we will need functions to add and subtract dictionaries (d1 and d2). We operate at a rate (eg. instead of subtracting the whole of d2, we subtract its third), which is given as a parameter.

def AddDictionaries(d1,d2,rate):
d3 = {};
for i in d1:
d3[i] = d1[i] + rate*d2[i];

return d3;

def SubDictionaries(d1,d2,rate):
d3 = {};
for i in d1:
d3[i] = d1[i] - rate*d2[i];

return d3;


### Training

Now we move onto the meat of the algorithm.

First we need a function to calculate the confidence (the dot product) of an item and a weight. We can calculate the dot product by multiplying the values for each feature of the weight and item vectors and adding them together.

$\sum_{i=1 to f+1}{weight[i]*item[i]}$

where f is the number of features (we add one for the augmentation of the vectors with the bias). In Python code:

def CalculateConfidence(item,weight):
#Add the product of the weight
#and item values for each feature
confidence = 0;

for k in weight:
confidence += weight[k]*item[k];

return confidence;


Next we will create the function to train the perceptron (aka calculating the weights). We will name this function CalculateWeights. It takes us input the training data, the learning rate (aka correction rate), the classes and features, and the number of epochs (aka how many times we will loop through the data).

First we want to initialize the weights. As we are building a mulit-class Perceptron using Kesler’s Construction, each class has its own weights for the features. To implement this, we will use a dictionary of dictionaries. The keys of the outer dictionary are the classes and for each class-dictionary, the keys are the features. We will initialize the values of each weight to 0.

weights = {};

for c in classes:
weights[c] = {"Bias":0};
for f in features:
weights[c][f] = 0;


To get a better feel of how the dictionary weights is laid out, we will quickly read data from it:

print weights[c][f]; #Prints the weight of feature f for class c.
print weights[c]; #Prints the feature weights for class c.


Having initialized our weights, we move to adjusting them. The algorithm in pseudocode:

For each epoch:
For each item, i, in training set:
Find confidence of i for all classes
Pick max confidence
c is the class corresponding to max confidence

Compare c with actual class of i
If they are the same, the guess was correct
Move to next item
If they are different, do the following:
For the weights of c, subtract the item values
For the weights of the correct class, add item values


Let’s implement the above:

for epoch in range(epochs):
for item in trainingSet:
y = -1;
guess = '';
for w in weights:
confidence = CalculateConfidence(item,weights[w]);

if(confidence > y):
y = confidence;
guess = w;

correct = item["Class"];
if(correct != guess):
weights[guess] =
SubDictionaries(weights[guess],item,rate);
weights[correct] =


Bringing all the parts of the function together we have:

def CalculateWeights(trainingSet,rate,epochs,classes,features):
#Initialize weights at 0
weights = {};

#Initialize weights dictionary.
#Weights are divided in classes.
#Each class has its own dictionary for the features.
for c in classes:
weights[c] = {"Bias":0};
for f in features:
weights[c][f] = 0;

for epoch in range(epochs):
for item in trainingSet:
#Iterate through trainingSet
#Guess where item belongs
y = -1;
guess = "";
for w in weights:
confidence =
CalculateConfidence(item,weights[w]);

if(confidence > y):
y = confidence;
guess = w;

correct = item["Class"];
if(correct != guess):
weights[guess] =
SubDictionaries(weights[guess],item,rate);
weights[correct] =

return weights;


With the above function we can train the weights, now we need to write a function to classify an item. We will call this function Perceptron. Given an item and the weights, first it will augment the item vector with the bias, then it will find the maximum confidence and it will classify the item in the class that produced that max confidence.

def Perceptron(item,weights):
item["Bias"] = 1; #Augment item vector with bias
m = -1; #Hold the maximum
classification = ""; #Hold the classification

#Calculate chance of item being in each class,
#pick the maximum
for w in weights:
#Multiply the item vector with the class weights vector
guess = CalculateConfidence(item,weights[w]);
if(guess > m):
#Our guess is better than our current best guess,
#update max and classification
m = guess;
classification = w;

return classification;


### Usage

To use the Perceptron we will first read our data from a text file, we will set the parameters of the Perceptron (often called hyper-parameters and they are the learning rate, epochs, etc.) then we will calculate the weights and finally we will classify the new item.

items, classes, features = ReadData('data.txt');

lRate = 0.1;
epochs = 300;

item = {'PW' : 1.4, 'PL' : 4.7, 'SW' : 3.2, 'SL' : 7.0};
print Perceptron(item,weights);


With that the code is complete. You can find all the code on my github. It includes two evaluation functions.

Perceptron Theory