Gaussian processes (GPs) are commonly used as models for functions, time series, and spatial fields, but they are computationally infeasible for large datasets. Focusing on the typical setting of modeling observations as a GP plus an additive noise term, we propose a generalization of the Vecchia (1988) approach as a framework for GP approximations. We show that our general Vecchia approach contains many popular existing GP approximations as special cases, allowing for comparisons among the different methods within a unified framework. Representing the models by directed acyclic graphs, we determine the sparsity of the matrices necessary for inference, which leads to new insights regarding the computational properties. Based on these results, we propose a novel sparse general Vecchia approximation, which ensures computational feasibility for large datasets but can lead to tremendous improvements in approximation accuracy over Vecchia's original approach. We provide several theoretical results, conduct numerical comparisons, and apply the methods to satellite data. This work is the result of a collaboration with Matthias Katzfuss.