I've managed to create a self-contained test case based on LoadAllSECFilings to parse a company name out of any SEC XBRL report.
I get "INFO SecGrabberImpl.java 60 - # URIs in SEC feed = 0.
Any idea what's happening?
The RSS feed from the SEC contains lots of URIs. The SECGrabber class operates in two passes.
1. Get the XML document as a DOM object.
2. Get the edgar:xbrlFile elements in that document as a node list.
Count those elements up. That is the number of XBRL files found. Log that giving a number in excess of 1000 for the SEC RSS feed.
3. Only capture those XBRL files that are instances of 100 or 101 forms.
4. Make sure that the URI given to locate the instance files is OK.
Finally, log the number of resources found. This is generally much smaller than the number of XBRL files because linkbase files and schemas are omitted.
What I am trying to find out is whether the grabber is loading the file at all, which would then indicate a problem with resource identification - something that could happen if the SEC format changed for instance - not that I think that has happened. More likely, you are not getting the actual SEC filing itself or some problem is occurring when the DOM is being built etc.
Regards
Geoff S
PS:
The relevant code from the grabber class is shown below:
privatestaticfinalStringNAMESPACE="http://www.sec.gov/Archives/edgar";privatestaticfinalStringNAME="xbrlFile";publicList<URI>getResources(){List<URI>resources=newArrayList<URI>();Documentfeed=getDocument(getSource());NodeListnodes=feed.getElementsByTagNameNS(NAMESPACE,NAME);logger.info("#XBRLfiles="+nodes.getLength());LOOP:for(inti=0;i<nodes.getLength();i++){Elementelement=(Element)nodes.item(i);Stringtype=element.getAttributeNS(NAMESPACE,"type");Stringuri=element.getAttributeNS(NAMESPACE,"url");if(!(type.equals("EX-100.INS")||type.equals("EX-101.INS"))){logger.debug("Skipping"+uri);continueLOOP;//OnlyinterestedinXBRLinstancesasentrypoints.}if((uri!=null)&&((uri.endsWith(".xml"))||(uri.endsWith(".xbrl")))){try{resources.add(newURI(uri));}catch(URISyntaxExceptione){logger.warn("SECsourceURI:" + uri + "ismalformedandhasbeenignored."); } } } logger.info("#URIsinSECfeed="+resources.size());returnresources;
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
I've managed to create a self-contained test case based on LoadAllSECFilings to parse a company name out of any SEC XBRL report.
I get "INFO SecGrabberImpl.java 60 - # URIs in SEC feed = 0.
Any idea what's happening?
Best regards,
Andreas.
I'm using the 5.2 binaries.
Are you finding any XBRL files in the RSS feed? I have just rerun the example, that I think you have been basing your work off and it finds:
INFO SecGrabberImpl.java 38 - # XBRL files = 1720
INFO SecGrabberImpl.java 62 - # URIs in SEC feed = 202
I have checked, and it is picking up the latest feed.
Hi,
the following code isolates the problem I'm having:
Grabber grabber = new SecGrabberImpl(new URI("http://www.sec.gov/Archives/edgar/xbrlrss.xml"));
List<URI> resources = grabber.getResources();
System.out.println("URIs in SEC feed: " + resources);
returns:
INFO SecGrabberImpl.java 60 - # URIs in SEC feed = 0
URIs in SEC feed:
So no, the SecGrabberImpl (at least from 5.2) does not find URIs.
Cheers,
Andreas.
The RSS feed from the SEC contains lots of URIs. The SECGrabber class operates in two passes.
1. Get the XML document as a DOM object.
2. Get the edgar:xbrlFile elements in that document as a node list.
Count those elements up. That is the number of XBRL files found. Log that giving a number in excess of 1000 for the SEC RSS feed.
3. Only capture those XBRL files that are instances of 100 or 101 forms.
4. Make sure that the URI given to locate the instance files is OK.
Finally, log the number of resources found. This is generally much smaller than the number of XBRL files because linkbase files and schemas are omitted.
What I am trying to find out is whether the grabber is loading the file at all, which would then indicate a problem with resource identification - something that could happen if the SEC format changed for instance - not that I think that has happened. More likely, you are not getting the actual SEC filing itself or some problem is occurring when the DOM is being built etc.
Regards
Geoff S
PS:
The relevant code from the grabber class is shown below: