Many thanks for making this available, and under a liberal open source license too. My interest is to see if it can be used in DSpace to export citations for PDFs held there.

Unfortunately I cannot build it yet. I think there are a few rough edges. First it looks like there is a dependency of secondstring, another sourceforge project. But the jar for secondstring is not included. When I copied it to the lib dir alot more files compiled. But I still got some compilation errors, e.g:

compile:^
    [javac] Compiling 18 source files to g:\mystuff\wellabove\wellabove-clients\ citeseerx-beta-0.1\build
    [javac] g:\mystuff\wellabove\wellabove-clients\citeseerx-beta-0.1\src\java\e
du\psu\citeseerx\utility\SeerSoftTFIDF.java:64: cannot find symbol
    [javac] symbol  : method setCollectionSize(int)
    [javac] location: class edu.psu.citeseerx.utility.SeerSoftTFIDF
    [javac]             setCollectionSize(model.size());
    [javac]            
    [javac] g:\mystuff\wellabove\wellabove-clients\citeseerx-beta-0.1\src\java\e
du\psu\citeseerx\utility\SeerSoftTFIDF.java:68: cannot find symbol
    [javac] symbol  : method setDocumentFrequency(com.wcohen.ss.api.Token,int)
    [javac] location: class edu.psu.citeseerx.utility.SeerSoftTFIDF
    [javac]                     setDocumentFrequency(token, item.getFrequency());
    [javac]                    
    [javac] g:\mystuff\wellabove\wellabove-clients\citeseerx-beta-0.1\src\java\e
du\psu\citeseerx\utility\SeerSoftTFIDF.java:80: cannot find symbol
    [javac] symbol  : method tokenIterator()
    [javac] location: interface com.wcohen.ss.api.Tokenizer
    [javac]             Iterator<Token> iter = tokenizer.tokenIterator();

Also I spotted loads of files in src/perl/BatchExtractor/tmp. These look like temporary files. If so then they should not be in the distribution. Can someone comment on this please?

Can someone take a look at the compilation problems please?

Regards,

Andrew Marlow