Reg. Skipping files/folders while crawling

  • Charanya Mohan
    Charanya Mohan

    We have few doubts regarding Aperture

    For Crawling, We just input the Root Folder.The problem is it crawls only for few sub-folders and skips the rest.We guess this happens only for large root folders because we checked manually for small root folder where in it crawled for all the sub-folders.

    Also another problem we are facing is, the extractor is not able to extract the text for few files no matter whatever the file type is (i.e, It doesnt follow any pattern while extracting the text).

    Could you please help us to rectify these problems.

    • Antoni Mylka
      Antoni Mylka

      Your post mentions two issues: crawler failure and an extractor failure. I'd need more info in order to help you.

      There is nothing in the code that would explain such behavior.
      Does the crawl end correctly, if it doesn't what kind of error appears? (check the ExitCode passed to the crawlStopped method of the CrawlerHandler)
      Do you set any other options on the FileSystemDataSource apart from rootFolder?
      Which Aperrture version do you use.
      Other things you might check are filesystem permissions, can the crawler actually read those files.

      As far as the extraction is concerned:
      1. what is the format of those files
      2. is the mime type identified correctly? Maybe the MimeTypeIdentifier doesn't support that particular mime type
      3. does aperture actually have an extractor for that format (which one is it)
      4. if the extractor for that format fails, what happens (it fails silently without extracting anything, or throws an exception, what's the exception and what is the stack trace?)
      5. if the problematic files aren't proprietary I would be grateful if you could send them to me: antoni DOT mylka AT_SIGN gmail DOT com