In this paper we consider a situation where we are given a finite number of values which represent a sampling of weighted averages of a function f(x) corresponding to a uniform grid. We show that if the weight function phi(x) satisfies a dilation equation, there is a discrete multi-resolution analysis of these values corresponding to a diadic coarsening of the grid. We introduce a reconstruction procedure R which predicts f(x) from its discrete weighted averages to any desired order of accuracy and is conservative in the sense that weighted averaging of R reproduces the given input data. Our formulation allows for adaptive data-dependent reconstruction techniques in which R is a nonlinear functional of the input data. At each level of resolution k we use the reconstruction R to predict f(x) and its weighted averages at the (k - 1)th level, which is the next finer level of resolution. We define Q(k)(x; f ), the k th-scale component of f(x), to be the difference between the reconstruction of f(x) at level (k - 1) and that of level k, and {d(j)k - 1}, the kth-scale coefficients of f(x), to be the weighted averages of Q(k) on the finer grid. We show that the given input data can be reconstructed from knowledge of the scale coefficients {d(j)k} for all k and the weighted averages of f(x) at the coarsest grid. This observation leads to an efficient data compression technique. On the functional side, f(x) can be reconstructed to the accuracy of the finest grid from knowledge of the scale components Q(k)(X; f) for all k and the reconstruction of f(x) from the coarsest grid. When R is data-independent we show that each scale Component Q(k) can be represented in a basis of linearly independent generalized wavelets. This leads to representation of f(x) in a multi-resolution basis which is the union of these generalized wavelets for all levels of resolution. In this framework the original wavelets are obtained from a particular choice of reconstruction technique, namely taking R to be the projection of f into the linear span of all dilates and translates of phi(x). This is a restrictive coupling between the approximation technique R and the sense of averaging phi, which is unnecessary from the point of view of numerical analysis.