Photo by Jelleke Vanooteghem on Unsplash

Machine Learning with XGBoost

3 min readJan 5, 2021

“What exactly is machine learning?” One might ask. According to SAS Insights, Machine learning is a branch of Artificial Intelligence that is based on the idea that computers can be taught to recognize patterns in data. We see examples of this every day in our lives. Self-driving cars use machine learning to recognize crosswalks, pedestrians, and traffic lights. Our emails use it to detect spam mail by flagging certain emails with repeated phrases. Banks use it to detect fraudulent transactions. But how are machine learning algorithms able to learn from so many different types of data? Well, that’s when things start to become a little more complicated. There are plenty of different learning methods such as supervised learning, unsupervised learning, and reinforcement learning. Today we’ll be talking about supervised learning specifically with XGboost.

XGBoost or extreme gradient boosting is a very powerful ensemble machine learning algorithm. It can execute faster than other models due to some of the futures it offers like parallelization, distributed computing, and out of core computing. To simplify what those mean. With Distributed computing, XGBoost can scale to multiple machines on a cluster. Out of core tree learning helps XGBoost process data that is too big to fit into a computer’s main memory. Finally, XGBoost uses parallelization to create individual branches within the same tree to help reduce the execution time.

“How does XGBoost work?” Well, the answer is somewhat complicated. XGBoost works similarly to other boosting methods. It uses the gradient boosting decision tree algorithm. Which is what XGBoost is short for Extreme Gradient Boosting.

The first step is making an initial prediction. The prediction can be anything but the default is “0.5”. After that, we need to find the similarity score for the Root node. XGBoost calculates the similarity score by using this equation:

It gets the sum of the residuals for each node, squares the sum, then divides by the number of residuals plus lambda. After that, It calculates the relative contribution of the corresponding feature or “Gain” by using this equation:

Once we determine the Gain we move on to pruning the tree. First, the tree is grown to the set max depth, then we start from the bottom up. Here we introduce “gamma”. Gamma is a pseudo-regularization hyperparameter. Essentially we use this to help us determine which nodes should be dropped. The default for this hyperparameter is “0” and the higher you go the higher regularization your model will have. The math behind this part is simple enough. The model subtracts gamma from the Gain and if the result is less than 0, the node is dropped. This helps XGBoost run faster because it doesn’t waste time evaluating the regularization parameters for nodes that it plans on dropping.

After pruning XGBoost uses the leftover nodes to calculate the value of the final tree. It uses these same values to make new predictions for the next model. Finally, it begins the same process on the new model to create new predictions. The model continues this cycle until it either reaches the specified limit or it can’t lower the residuals anymore. Now we reached the final model. We take this final model plug in the functions we want to predict to complete our XGBoost model.

In summary, XGBoost is super-efficient and very well engineered. It offers a plethora of hyperparameters for pruning and tuning, along with some very well thought out built-in features to make modeling data much less arduous on your local machine. While still a somewhat younger algorithm, it’s quickly become a favorite among data scientists. XGBoost has carved out its place in the machine learning world and it doesn’t plan on going anywhere.

Machine Learning with XGBoost

Written by Edward Osorio