How to crawl a secure web site using Apertur?

dliao
2007-09-18
2013-05-13
  • dliao
    dliao
    2007-09-18

    Hi,

    I'm new to Aperture. I'm using ExampleWebCrawler to crawl one of my co. web site which is secured, every url I pass in generates the following error:

    Http connection error, response code = 307, url = http://mysite.mycompany.com
    ...

    What's the quickest way to fix it without putting all the anthentication code in?

    Another question is how to get rid of <triple></triple> tags and the contents/properties inside them from the rdf/xml output file?

        <graph>
            <uri>http://mysite.mycompany.com</uri>
            <triple>
                <uri>http://mysite.mycompany.com</uri>
                <uri>http://www.w3.org/1999/02/22-rdf-syntax-ns#type</uri>
                <uri>http://aperture.semanticdesktop.org/ontology/data#DataObject</uri>
            </triple>
            <triple>
                <uri>http://mysite.mycompany.com</uri>
                <uri>http://aperture.semanticdesktop.org/ontology/data#characterSet</uri>
                <plainLiteral>ISO-8859-1</plainLiteral>
            </triple>
            <triple>
                <uri>http://mysite.mycompany.com</uri>
                <uri>http://aperture.semanticdesktop.org/ontology/data#rootFolderOf</uri>
                <uri>urn:test:exampleimapsource</uri>
            </triple>
            <triple>
                <uri>http://mysite.mycompany.com</uri>
                <uri>http://aperture.semanticdesktop.org/ontology/data#retrievalDate</uri>
                <typedLiteral datatype='http://www.w3.org/2001/XMLSchema#dateTime'>2007-09-18T09:46:16</typedLiteral>
            </triple>
            <triple>
                <uri>http://mysite.mycompany.com</uri>
                <uri>http://aperture.semanticdesktop.org/ontology/data#mimeType</uri>
                <plainLiteral>text/html</plainLiteral>
            </triple>
        </graph>

    In the above example, I just need

    <graph>
            <uri>http://mysite.mycompany.com</uri>
    </graph>

    Any help/suggestions are greatly appreciated.

    David