OmniOpt2-Logo ScaDS.ai-Logo

🌲 Random Forest

What is a Random Forest?

A Random Forest is a meta-algorithm that builds a collective of decision structures (trees) and aggregates their outcomes to form a more stable, generalized response. It leverages randomness both in data selection and feature consideration, aiming to reduce bias and variance simultaneously.

How does it operate?

This creates a robust, noise-resistant estimator that thrives in high-dimensional, non-linear spaces.

How do Random Forests work?

A Random Forest works by building many simple decision trees and combining their results. Each tree is trained on a random subset of the data and uses a random selection of features. This randomness ensures that the trees are diverse. When it's time to make a prediction, each tree gives an answer, and the forest combines them—by majority vote for classification or averaging for regression. This collective decision-making helps reduce overfitting and makes the model more stable.
The parameter --n_estimators_randomforest controls how many decision trees are built in the forest. More trees can lead to better accuracy because the model has more opinions to average—but it also increases training time and memory usage.

What is a Decision Tree?

A Decision Tree is a simple, flowchart-like model that makes predictions by splitting the data into branches based on feature values. At each node, the tree chooses a feature and a threshold that best separates the data according to some criterion (like Gini impurity or information gain). This process continues until the data is fully split or a stopping condition is reached. In the end, each path through the tree leads to a leaf node that contains the predicted value or class. Decision Trees are easy to interpret but can easily overfit if used alone—this is why Random Forests combine many of them.

How does it guide parameter selection?

In optimization contexts:
In essence, the Random Forest acts as a structured intuition engine—guiding the search for optimal configurations without direct evaluation of every possibility.

When and why to use it?

Random Forests excel in scenarios where:
Compared to modular Bayesian approaches like BoTorch, Random Forests:
They are less suited when:
In such cases, modular GP-based methods may be more appropriate—but at higher complexity and cost.