Blogs

Random Forest v/s Gradient Boosting

Random forest is one of the most widely used machine learning algorithms, whereas Gradient Boosting is considered as the next big thing since the past couple of years. Now let us look at the basic difference between the two and try to understand when to use which classification technique.

Implementation of Random Forest in Java(Only for Nerds who want to code the hell out of ML)

To create random forest we have to create multiple classification trees. For N training set we need to sample N cases, at random from original data. For e.g. if our training data is something like

1)      “Xyz is at discounted price “- spam.

2)      “Tomorrow is important meeting”- Not-spam

We have 5 different words as input which is M. We will choose random k which is smaller than M which is 5 in this scenario and we will create tree based on k and this k will remain constant for all the data.  So suppose we create a tree for 2 words, so we will have 3 trees with 2 words. When a new input comes like “Xyz meeting”. This will go through all the trees and check the result and if 2 out of those 3 trees return spam result, the new data will be classified as spam.

Two ways to implement Random forest in java (For nerds who wants to Code)

1) Using Weka and Java : Use tool called weka (weka.wikispaces.com/) create a model in weka and directly use the model like this (http://www.programcreek.com/java-api-examples/index.php?api=weka.classifiers.trees.RandomForest). But here you will have to depend on weka and weka uses GPL V3 license which won’t let you commercialise your application. If you just want to learn and test, this is one of the easiest methods.

2) Using Just Java:  In just java we have to divide the whole project in three different parts .

1)     Describe Tree Category–  Class to create a decision tree based on the data matrix

The data matrix as a List of int arrays – each array is one record, each index in the array is one attribute, and the last index is the class        (ie [ x1, x2, . . ., xM, Y ]).

Tree Creation can be done in four steps using recursive function

Step 1: Check if the node is a leaf, if yes mark that node isLeaf variable as true and your function shouldn’t recurse after this point.

Step2 : Keep a group of node where each node has a reference for left and right node.

Step3: Check for a split to see if the number is lower than the tree height we decided in the beginning

Step4 : check if the node only has one record, we mark it as a leaf and record this as equal to record.

Create a function which will traverse through the tree and return the prediction

2)     Random Forest Category : This class should initialize the Breiman Random Forest Creation , by creating multiple threads to handle multiple trees . Create inner class to generate a decision tree in a thread pool environment.

3)     Controller Main class which will take input for training and testing data. We will check how long the tree should be based on below formula, Math.round(Math.log(x)/Math.log(2)+1), where x is the size of the train data, value of this will remain constant throughout the project and this value will decide how deep the tree should be in Random Forest Cateogry.

You can download the GitHub project (https://github.com/ironmanMA/RandomForest) this will help you, understand random forest java logic from end to end . Happy Coding!!!!!!

References

https://en.wikipedia.org/wiki/Random_forest

https://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm

Leave a Reply

Visit Us On TwitterVisit Us On FacebookVisit Us On YoutubeVisit Us On Linkedin