You'll need JDK 1.6 or JRE 6.0 or higher versions to run the following:
java -jar lemur/ireval/src/ireval.jar baseline_result testrun_result qrel_file
This will compare the baseline with the test run using the provided relevance judgements in qrel_file.
It will give you average AP, P@topN, etc. measures, and also do 4 statistical significance tests, including the randomization test and the sign test. The tests are two-tailed.
Results should be in TREC format, e.g. with -trecFormat=true in the command line option for IndriRunQuery.
qrel_file is simply one of the qrel files you can download from TREC.