A good starting point for troubleshooting would be to create a small sitemap.xml file that only lists a couple of the known problem files -- it would be interesting to see if indexing them on their own works better than indexing them in the context of the full sitemap.  If the URLs are public, you could send me the sitemap file so I can check for different results on my test server.

- Demian

From: Nathan Tallman [ntallman@gmail.com]
Sent: Friday, June 29, 2012 12:18 PM
To: vufind-tech
Subject: [VuFind-Tech] File Not Indexed by Aperture in Website Indexing

I'm using using VUFIND-454 <http://vufind.org/jira/browse/VUFIND-454> to index our institutional website. There are some webpages (HTML) and PDFs that are listed in the sitemap, yet not getting indexed. The HTML is standard and pages that have identical coding with different text get indexed fine. The PDFs are generated from InDesign and searchable in Acrobat/Reader, so the text should be easy to scrape. Again, it indexes similar files without a problem.

Aperture isn't outputting anything that looks like it's missing files, no PHP fatal errors about memory (which was once a problem, now solved.)

Any ideas on what might be causing this or how to troubleshoot?