In regression, we often talk about the difference between the population line and the observations as “errors.” In some introductory texts these are even called “measurement errors” in Y. Sometimes they are errors in Y, and sometimes they are even measurement errors in Y, but much more often Y is the truth and the ‘error’ is the error in predicting Y by a straight line. As Dan Davies observed (from memory) “The Great Depression really happened; it wasn’t just an unusually inaccurate observation of an underlying 4% return on equities”
Why do we assume errors have zero mean? If ‘errors’ actually are measurement errors this is genuinely an assumption and could be falsified empirically. One example is pulmonary function measurements FEV1 and FVC, which are defined to be the maximum attainable by the individual, and so by definition do not have zero measurement error. More often, though, the mean of the residuals is not identifiable separately from the intercept, and we just choose the parametrization that has mean-zero residuals. In that situation it’s not an assumption and couldn’t be falsified empirically.