The 4D-Var method for filtering partially observed nonlinear chaotic dynamical systems consists of finding the maximum a-posteriori (MAP) estimator of the initial condition of the system given observations over a time window, and propagating it forward to the current time via the model dynamics. This method forms the basis of most currently operational weather forecasting systems. In practice the optimization becomes infeasible if the time window is too long due to the non-convexity of the cost function, the effect of model errors, and the limited precision of the ODE solvers. Hence the window has to be kept sufficiently short, and the observations in the previous windows can be taken into account via a Gaussian background (prior) distribution. The choice of the background covariance matrix is an important question that has received much attention in the literature. In this paper, we define the background covariances in a principled manner, based on observations in the previous $b$ assimilation windows, for a parameter $b\ge 1$. The method is at most $b$ times more computationally expensive than using fixed background covariances, requires little tuning, and greatly improves the accuracy of 4D-Var. As a concrete example, we focus on the shallow-water equations. The proposed method is compared against state-of-the-art approaches in data assimilation and is shown to perform favourably on simulated data. We also illustrate our approach on data from the recent tsunami of 2011 in Fukushima, Japan.