A neural network is a computational system loosely inspired by biological neurons, consisting of interconnected layers of mathematical units that transform input data through learned weighted connections and nonlinear activation functions. The basic unit is a neuron that computes a weighted sum of its inputs, adds a bias term, and applies an activation function. Neurons are organized into layers: the input layer receives raw data, hidden layers perform intermediate transformations, and the output layer produces predictions. Each connection between neurons has a learnable weight. During training, these weights are adjusted using backpropagation and gradient descent to minimize prediction error on training data. The power of neural networks comes from depth and nonlinearity. Deep networks with many hidden layers can learn hierarchical representations, early layers detect simple patterns like edges, later layers combine these into complex features like faces. Without nonlinear activation functions, any neural network would collapse to a linear transformation regardless of depth. Universal approximation theorems prove that sufficiently wide networks can approximate any continuous function. Different architectures suit different data types: feedforward networks for tabular data, convolutional networks for images, recurrent networks for sequences, and transformers for modern language and vision tasks. Neural networks dominate modern AI because they scale, larger networks trained on more data consistently perform better.
Back to Glossary