While fetching (shuffling) inputs for a particular reduce, it gets stuck (randomly) while transferring a particular file. It gets stuck in all the cases inputfetchernoquery. This causes reduces to get stuck indefinitely and not get completed. Since MR+ uses the same module to shuffle files, the same problem might occur there too. This problem is more apparent in MR+ since it has to transfer the same number of files as there are maps.
The attached is the inputfetcher was used for diagnosing the problem - it is only different from Version 1.2.2.2 in the way that it has extra log statements just to note where it gets stuck in execution (search for TEMP_TAG1, TEMP_TAG2, .... in the attached file). In this particular test, 6 reduces out of 10 get stuck transferring particular files. The following are log statements printed by each stuck reduce worker. It also mentions the TEMP_TAG number reached before it apparently got stuck:
----------------------------------------------------------------------------------------------
Reduce 0 stuck (run on 192.168.100.250) - Reached no Tag
2010-03-18 18:02:46 DEBUG inputfetchernoquery.fetch_input Fetching output (4, 'M', 348, '192.168.100.251') for task-id 0
Reduce 1 stuck (run on 192.168.100.250) - Done till Tag5
2010-03-18 18:02:39 DEBUG inputfetchernoquery.__transfer_file TEMP TAG5 (4, 'M', 129, '192.168.100.252') for task-id 1
Reduce 2 stuck (run on 192.168.100.250) - Done till tag 2
2010-03-18 18:02:47 DEBUG inputfetchernoquery.__transfer_file TEMP TAG2 (4, 'M', 368, '192.168.100.244') for task-id 2
Reduce 6 stuck (run on 192.168.100.251) - Reached no Tag
2010-03-18 18:02:52 DEBUG inputfetchernoquery.fetch_input Fetching output (4, 'M', 258, '192.168.100.248') for task-id 6
Reduce 7 stuck (run on 192.168.100.251) - Done till tag 1
2010-03-18 18:02:49 DEBUG inputfetchernoquery.__transfer_file TEMP TAG1 (4, 'M', 168, '192.168.100.243') for task-id 7
Reduce 8 stuck (run on 192.168.100.251) - Done till tag 6
2010-03-18 18:02:52 DEBUG inputfetchernoquery.__transfer_file TEMP TAG6 (4, 'M', 232, '192.168.100.248') for task-id 8
----------------------------------------------------------------------------------------------
As you can see all reduces got stuck at different points in trying to transfer a file. This sporadic nature could very well be due to buffering done by the logger - while in actual it was getting stuck at the same point in all reduces. This could also be due to multiple bugs in the code.
Other than that there wasn't anything peculiar; apart from an unrelated Error (from which the system recovered) close to Reduce 0's last log statement.