Classification using PHP-ML

suggest change

Classification in Machine Learning is the problem that identifies to which set of categories does a new observation belong. Classification falls under the category of Supervised Machine Learning.

Any algorithm that implements classification is known as classifier

The classifiers supported in PHP-ML are

SVC (Support Vector Classification)
k-Nearest Neighbors
Naive Bayes

The train and predict method are same for all classifiers. The only difference would be in the underlying algorithm used.

SVC (Support Vector Classification)

Before we can start with predicting a new observation, we need to train our classifier. Consider the following code

// Import library
use Phpml\Classification\SVC;
use Phpml\SupportVectorMachine\Kernel;

// Data for training classifier
$samples = [[1, 3], [1, 4], [2, 4], [3, 1], [4, 1], [4, 2]];  // Training samples
$labels = ['a', 'a', 'a', 'b', 'b', 'b'];

// Initialize the classifier
$classifier = new SVC(Kernel::LINEAR, $cost = 1000);
// Train the classifier
$classifier->train($samples, $labels);

The code is pretty straight forward. $cost used above is a measure of how much we want to avoid misclassifying each training example. For a smaller value of $cost you might get misclassified examples. By default it is set to 1.0

Now that we have the classifier trained we can start making some actual predictions. Consider the following codes that we have for predictions

$classifier->predict([3, 2]); // return 'b'
$classifier->predict([[3, 2], [1, 5]]); // return ['b', 'a']

The classifier in the case above can take unclassified samples and predicts there labels. predict method can take a single sample as well as an array of samples.

k-Nearest Neighbors

The classfier for this algorithm takes in two parameters and can be initialized like

$classifier = new KNearestNeighbors($neighbor_num=4);
$classifier = new KNearestNeighbors($neighbor_num=3, new Minkowski($lambda=4));

$neighbor_num is the number of nearest neighbours to scan in knn algorithm while the second parameter is distance metric which by default in first case would be Euclidean. More on Minkowski can be found here.

Following is a short example on how to use this classifier

// Training data
$samples = [[1, 3], [1, 4], [2, 4], [3, 1], [4, 1], [4, 2]];
$labels = ['a', 'a', 'a', 'b', 'b', 'b'];

// Initialize classifier
$classifier = new KNearestNeighbors();
// Train classifier
$classifier->train($samples, $labels);

// Make predictions
$classifier->predict([3, 2]); // return 'b'
$classifier->predict([[3, 2], [1, 5]]); // return ['b', 'a']

NaiveBayes Classifier

NaiveBayes Classifier is based on Bayes' theorem and does not need any parameters in constructor.

The following code demonstrates a simple prediction implementation

// Training data
$samples = [[5, 1, 1], [1, 5, 1], [1, 1, 5]];
$labels = ['a', 'b', 'c'];

// Initialize classifier
$classifier = new NaiveBayes();
// Train classifier
$classifier->train($samples, $labels);

// Make predictions
$classifier->predict([3, 1, 1]); // return 'a'
$classifier->predict([[3, 1, 1], [1, 4, 1]); // return ['a', 'b']

Practical case

Till now we only used arrays of integer in all our case but that is not the case in real life. Therefore let me try to describe a practical situation on how to use classifiers.

Suppose you have an application that stores characteristics of flowers in nature. For the sake of simplicity we can consider the color and length of petals. So there two characteristics would be used to train our data. color is the simpler one where you can assign an int value to each of them and for length, you can have a range like (0 mm,10 mm)=1 , (10 mm,20 mm)=2. With the initial data train your classifier. Now one of your user needs identify the kind of flower that grows in his backyard. What he does is select the color of the flower and adds the length of the petals. You classifier running can detect the type of flower (“Labels in example above”)

Found a mistake? Have a question or improvement idea? Let me know.

Table Of Contents