Wow.
Art, it is basic math. Any image one obtains is a convolution of the original scene. In simple terms, that is blur (e.g.from diffraction, lens aberrations) When one downsizes an image, that is another convolution with sub-sampling. The convolutions work together to make the image blurrier than the original, then the sub sampling hides some of the result.
In more familiar terms to photographers, think of MTF. MTF describes at what level the optical system can record detail. The MTF of any image recorded with an optical system is not perfect (e.g. blurred by lens aberrations and diffraction), thus less than 1. Downsampling lowers MTF. The second MTF multiplies the first MTF (any two numbers less than one multiplied together result in an even smaller number), so if you start with a low MTF image (e.g. blurred), the downsized image will be blurrier (even lower MTF) than if you started with a higher MTF image (a sharper image). I think this is proven all the time here when people complain the original image was not sharp enough. If a sharp image wasn't needed to start with, we would not hear people here saying it was close but not sharp enough so should be deleted.
Everyone can test this effect. Find two images where with one the focus is slightly off, thus slightly blurred (and lower MTF). Down sample both then sharpen the same. Can you see a difference? One can partly compensate with more sharpening on the downsampled blurier image, but usually with increased artifacts, as demonstrated in this thread, e.g. Arash's test image. However, if you sharpen first then downsample, the result is sharper (again the multiplication of the two MTFs--back to basic math). All real-world sharpening also results in artifacts. So if you sharpen first, then down sample, those artifacts are smaller by definition. Another win.
Perhaps some of the confusion lies in how often to sharpen. Any time one downsizes, sampling theory says there is degradation in MTF (the convolution). Thus after down sampling, one needs to sharpen, even if one sharpened before. The lesson is anytime there is a convolution, one can sharpen (ideally with deconvolution). When the image is obtained, that is the first convolution, so best to sharpen (deconvolve) at that point.
Regarding some other comments in this thread about not thinking my references are relevant, I gave deconvolution references ranging from astronomy to microscopy. That pretty much covers the focal length and magnification range of photography. Everything else is in between. I had hoped people would see the proverbial light: Ah Ha, if high magnification microscopy can be deconvolved and low magnification images made at infinity with long lenses can be deconvolved, I should be able to deconvolve my images too.
Roger