Random Forest is an ensemble method that builds many decision trees and aggregates their predictions. Each tree trains on a random subset of the data, and at each split it considers only a random subset of features. This double randomization reduces the variance that a single tree would have, producing more stable results. For classification, the forest takes a majority vote across all trees. For regression, it averages the numeric predictions. Individual trees may overfit or make errors, but those errors get diluted by the collective. Random Forest requires relatively little parameter tuning, handles noisy and missing data well, and trains quickly on modern hardware. It is used in credit-risk scoring, fraud detection, medical imaging, and recommendation systems. The built-in feature-importance scores help analysts understand which variables drive predictions, which matters in regulated industries. Because the algorithm parallelizes easily, enterprises run it on clusters, cutting training time from hours to minutes.
Back to Glossary