During our crawls we have found pages on which the parser is incredibly slow (almost one minute for a 1677728 bytes page). Apparently, the parser is stuck in CharSequenceParseText.indexOf(String,int,int), likely indulging in some sort of quadratic behaviour.
One of the pages is included. Just try it with the StreamedSourceCopy example.
This is really a problem for us... please help :).