Menu

SEC Filings Format

Help
2010-11-03
2013-05-02
  • Andreas Harth

    Andreas Harth - 2010-11-03

    Hi,

    I've managed to create a self-contained test case based on LoadAllSECFilings to parse a company name out of any SEC XBRL report.
    I get "INFO  SecGrabberImpl.java 60  - # URIs in SEC feed = 0.
    Any idea what's happening?

    Best regards,
    Andreas.

    public void testLoadSecXbrl() throws Exception {
            // Set up the data store to load the data
            Store store = new StoreImpl();
            // Ensure that the newly discovered relationships are also stored.
            store.setAnalyser(new AnalyserImpl(store));
    
            XBRLXLinkHandlerImpl xlinkHandler = new XBRLXLinkHandlerImpl();
            XBRLCustomLinkRecogniserImpl clr = new XBRLCustomLinkRecogniserImpl(); 
            XLinkProcessor xlinkProcessor = new XLinkProcessorImpl(xlinkHandler ,clr);
            
            File cacheDir = new File("/tmp/cache");
            if (!cacheDir.exists()) {
                cacheDir.mkdir();
            }
            
            // Rivet errors in the SEC XBRL data require these URI remappings to prevent discovery process from breaking.
            HashMap<URI,URI> map = new HashMap<URI,URI>();
            try {
                map.put(new URI("http://www.xbrl.org/2003/linkbase/xbrl-instance-2003-12-31.xsd"),new URI("http://www.xbrl.org/2003/xbrl-instance-2003-12-31.xsd"));
                map.put(new URI("http://www.xbrl.org/2003/instance/xbrl-instance-2003-12-31.xsd"),new URI("http://www.xbrl.org/2003/xbrl-instance-2003-12-31.xsd"));
                map.put(new URI("http://www.xbrl.org/2003/linkbase/xbrl-linkbase-2003-12-31.xsd"),new URI("http://www.xbrl.org/2003/xbrl-linkbase-2003-12-31.xsd"));
                map.put(new URI("http://www.xbrl.org/2003/instance/xbrl-linkbase-2003-12-31.xsd"),new URI("http://www.xbrl.org/2003/xbrl-linkbase-2003-12-31.xsd"));
                map.put(new URI("http://www.xbrl.org/2003/instance/xl-2003-12-31.xsd"),new URI("http://www.xbrl.org/2003/xl-2003-12-31.xsd"));
                map.put(new URI("http://www.xbrl.org/2003/linkbase/xl-2003-12-31.xsd"),new URI("http://www.xbrl.org/2003/xl-2003-12-31.xsd"));
                map.put(new URI("http://www.xbrl.org/2003/instance/xlink-2003-12-31.xsd"),new URI("http://www.xbrl.org/2003/xlink-2003-12-31.xsd"));
                map.put(new URI("http://www.xbrl.org/2003/linkbase/xlink-2003-12-31.xsd"),new URI("http://www.xbrl.org/2003/xlink-2003-12-31.xsd"));
            } catch (URISyntaxException e) {
                throw new XBRLException("URI syntax exception",e);
            }
            EntityResolver entityResolver = new EntityResolverImpl(cacheDir,map);      
            
            Loader loader = new LoaderImpl(store,xlinkProcessor);
            loader.setCache(new CacheImpl(cacheDir));
            loader.setEntityResolver(entityResolver);
            xlinkHandler.setLoader(loader);
            
            Grabber grabber = new SecGrabberImpl(new URI("http://www.sec.gov/Archives/edgar/xbrlrss.xml"));
            List<URI> resources = grabber.getResources();
            System.out.println(resources);
        }
    
     
  • Andreas Harth

    Andreas Harth - 2010-11-03

    I'm using the 5.2 binaries.

     
  • Geoffrey Shuetrim

    Are you finding any XBRL files in the RSS feed?  I have just rerun the example, that I think you have been basing your work off and it finds:

    INFO  SecGrabberImpl.java 38  - # XBRL files = 1720
    INFO  SecGrabberImpl.java 62  - # URIs in SEC feed = 202

    I have checked, and it is picking up the latest feed.

     
  • Andreas Harth

    Andreas Harth - 2010-11-03

    Hi,

    the following code isolates the problem I'm having:

            Grabber grabber = new SecGrabberImpl(new URI("http://www.sec.gov/Archives/edgar/xbrlrss.xml"));
            List<URI> resources = grabber.getResources();

            System.out.println("URIs in SEC feed: " + resources);

    returns:

    INFO  SecGrabberImpl.java 60  - # URIs in SEC feed = 0
    URIs in SEC feed:

    So no, the SecGrabberImpl (at least from 5.2) does not find URIs.

    Cheers,
    Andreas.

     
  • Geoffrey Shuetrim

    The RSS feed from the SEC contains lots of URIs.  The SECGrabber class operates in two passes. 

    1. Get the XML document as a DOM object.

    2. Get the edgar:xbrlFile elements in that document as a node list.

    Count those elements up.  That is the number of XBRL files found.  Log that giving a number in excess of 1000 for the SEC RSS feed.

    3. Only capture those XBRL files that are instances of 100 or 101 forms.

    4.  Make sure that the URI given to locate the instance files is OK.

    Finally, log  the number of resources found.  This is generally much smaller than the number of XBRL files because linkbase files and schemas are omitted.

    What I am trying to find out is whether the grabber is loading the file at all, which would then indicate a problem with resource identification - something that could happen if the SEC format changed for instance - not that I think that has happened.  More likely, you are not getting the actual SEC filing itself or some problem is occurring when the DOM is being built etc.

    Regards

    Geoff S

    PS:

    The relevant code from the grabber class is shown below:

        private static final String NAMESPACE = "http://www.sec.gov/Archives/edgar";
        private static final String NAME = "xbrlFile";
    
        public List<URI> getResources() {
            List<URI> resources = new ArrayList<URI>();
            Document feed = getDocument(getSource());
            NodeList nodes = feed.getElementsByTagNameNS(NAMESPACE,NAME);
            logger.info("# XBRL files = " + nodes.getLength());
    
            LOOP: for (int i=0; i<nodes.getLength(); i++) {
                Element element = (Element) nodes.item(i);
                String type = element.getAttributeNS(NAMESPACE, "type");
                String uri = element.getAttributeNS(NAMESPACE, "url");
                if (! (type.equals("EX-100.INS") || type.equals("EX-101.INS"))) {
                    logger.debug("Skipping " + uri);
                    continue LOOP;// Only interested in XBRL instances as entry points.
                }
                if (
                        (uri != null) &&
                        (
                            (uri.endsWith(".xml")) 
                            || (uri.endsWith(".xbrl"))
                        )
                    ) {
                    try {
                        resources.add(new URI(uri));
                    } catch (URISyntaxException e) {
                        logger.warn("SEC source URI: " + uri + " is malformed and has been ignored.");
                    }
                }
            }
            logger.info("# URIs in SEC feed = " + resources.size());
            return resources;
    
     

Log in to post a comment.