veda.ng
Back to Glossary

Hyperparameter

A hyperparameter is a configuration value set before training begins that controls the learning process but is not learned from data. Unlike model parameters (weights and biases) that are optimized during training, hyperparameters are chosen by practitioners and remain fixed throughout training. Learning rate is the most critical hyperparameter, too high and training diverges, too low and it stalls. Batch size affects gradient noise and memory usage. Number of layers and hidden dimensions determine model capacity. Dropout rate controls regularization strength. Optimizer choice affects convergence behavior. Finding optimal hyperparameters is often as important as model architecture. Grid search exhaustively evaluates all combinations from a predefined set, which becomes computationally prohibitive as the number of hyperparameters grows. Random search samples randomly from distributions, often finding good values faster than grid search for the same compute budget. Bayesian optimization models the relationship between hyperparameters and performance, intelligently selecting configurations to evaluate next. Population-based training evolves hyperparameter schedules during training. Hyperparameter transfer suggests good values from similar problems. Modern practice often uses learning rate schedulers that change the rate during training, warmup periods that gradually increase learning rate at the start, and automated hyperparameter tuning services. Poor hyperparameters waste compute on failed training runs.