I was under the impression that (in the absence of bilinear or trilinear filtering) each pixel simply maps to a single texel, no matter how far away you are.  This is what causes small (in screenspace) polygons that use a large range of texture coordinates to look static-y as they move. 

Mipmaps address this by successively averaging the adjacent texels down until you get a teensy texture image (when this is sampled, effectively, the hardware is reading an average of all the texels instead of one texel more-or-less at random from the original image).  The (speed) advantage of mipmapping is that the texture data that's being sampled can be smaller, so the hardware can find the proper values more efficiently. 

Of course, in theory, mipmaps ought to be slowest.  Bilinear filtering requires 4 samples (hardware-controlled samples, but four nonetheless).  Trilinear filtering of course uses 8, and unfortunately, texture samples are one of the slowest processes on any graphics card.  A quick benchmark on my computer confirms all this to be true.  'Course, there may be something about well-defined graphics paths on other computers that I don't know about . . .

And I agree--mipmaps are great.  Use 'em anyway.