Not sure if I am missing something or not, but I can't seem to see a simple way to specify a location outside of the server's www root. I would like to place all PDFs of papers outside of this, because as it currently is, once someone guesses the subfolder that the PDFs are in, they could quite easily access them all, even if they are only a guest without user privledges. For the moment I have simply added an index.html page in the PDFs folder that redirects to the refbase homepage, but perhaps it would be possible to implement something using a PHP readfile command to grab the PDF.
Matthias and I had discussed this prior to the 0.8.0 release, but it wasn't a high priority.
If you have Indexes off (in httpd.conf) or an index.html file (as you have created), it does take a lot of work to get those pdfs, as you need to know the filenames and the directory those files are in.
Filenames are still disclosed if you do an advanced search with a filename when you aren't logged in. I believe this is are only weak point, though. Fixing this weakness would be good & should be easy.
Putting files outside of the document root isn't a high priority for me, as the above change would fix most everyone's concerns about housing documents in the webroot & because some are FORCED to keep uploaded files in the webroot.
I understand that it may be hard for a user to guess the exact filename so I am pretty happy. However, will search engine be able to index links to the PDFs? I am thinking they won't, since they won't be able to see the PDF link without being logged in, but just thought I'd ask.
AFAIK, the search engine will need a link to index the file. So I think it should be fine if there's no link being exposed to either the user or the search engine.
Thanks for pointing these issues out! User feedback helps us a lot to improve the whole thing.
Yes, that is correct. Search engines won't see links unless you've allowed links to guest users.
If you are extra concerned about indexing of PDFs, you can put the files directory in your robots.txt file.
I agree that searching for files in Advanced Search (or SQL Search if enabled) shouldn't be possible/allowed if a user has no permission to download any protected files. That should really get fixed for the next release.
I understand that it's a valid request to put files outside of the document root, since a user could still access files if he happens to know the exact path to the file (which may even get easier when we support file-renaming schemes). We should think about this. I'm not aware of any immediate solution, though.
Good point re. the file-renaming schemes. One possiblity (off the top of my head) is to have the renaming scheme a setting for the admin & to encourage appending either a random string or something about the file (size/some date associated with it, checksum) to the name of the file if filename disclosure is a concern.
I'm not sure if I hit the point here but we use HTACCESS to avoid file-downloads from unauthorized peple. Does that hint help?
Thanks to everyone - I am happy with either the no indexes directive in the apache conf file, or simply creating an index.html file in the files/PDFs directory.