I'm interested in using "grok" to help interpret the tags on the rows of financial statements. Typically, these are not sentences, but noun clauses. Some examples:
ASSETS:
Cash and cash equivalents
Marketable securities
Total cash, cash equivalents and marketable securities
Accounts receivable - trade and other
Inventories
Prepaid employee benefits, taxes and other expenses
Finance receivables and retained interests in sold receivables
Property and equipment
Special tools
Intangible assets
Other noncurrent assets
LIABILITIES:
Accounts payable
Accrued liabilities and expenses
Short-term debt
Payments due within one year on long-term debt
Long-term debt
Accrued noncurrent employee benefits
Other noncurrent liabilities
If I could get a sentence diagram out, in which I could then look for stock phrases and identify subordinate clauses to them, that would be sufficient. What I need, in a LISP-like notation,
is parsing into something like
The Grok parser will certainly parse these noun phrases and produce something resembling the structures you describe. The problem is that, in general, correctly parsing nps is really hard because you need a lot of world knowledge to know what the attachments are.
The good news is that, I believe, Grok has a category tagger trained off of wall street journal (ie. financial) text, so it might actually do a decent job in your case. Jason is the person to ask about this, and I believe that he is writing some code to make the parser simpler to use. Jason?
Gann
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks for the quick reply. You can contact me directly at "nagle@downside.com".
Visiting http://www.downside.com will show what I'm doing with this info. It could get Grok some publicity; Downside gets a substantial number of hits.
Right now, I'm working on the code that extracts tables from the SEC database (http://www.sec.gov), and finds rows and columns. Once I have that done, I'll have text to feed into Grok.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'm interested in using "grok" to help interpret the tags on the rows of financial statements. Typically, these are not sentences, but noun clauses. Some examples:
ASSETS:
Cash and cash equivalents
Marketable securities
Total cash, cash equivalents and marketable securities
Accounts receivable - trade and other
Inventories
Prepaid employee benefits, taxes and other expenses
Finance receivables and retained interests in sold receivables
Property and equipment
Special tools
Intangible assets
Other noncurrent assets
LIABILITIES:
Accounts payable
Accrued liabilities and expenses
Short-term debt
Payments due within one year on long-term debt
Long-term debt
Accrued noncurrent employee benefits
Other noncurrent liabilities
If I could get a sentence diagram out, in which I could then look for stock phrases and identify subordinate clauses to them, that would be sufficient. What I need, in a LISP-like notation,
is parsing into something like
(and (cash) (cash (equivalents)))
(securities (marketable))
(total (cash) (cash (equivalents)) (securities (marketable)))
(accounts (receivable (and (trade other)))
Is this something one can reasonably do with GROK?
Thanks.
Well, yes and no.
The Grok parser will certainly parse these noun phrases and produce something resembling the structures you describe. The problem is that, in general, correctly parsing nps is really hard because you need a lot of world knowledge to know what the attachments are.
The good news is that, I believe, Grok has a category tagger trained off of wall street journal (ie. financial) text, so it might actually do a decent job in your case. Jason is the person to ask about this, and I believe that he is writing some code to make the parser simpler to use. Jason?
Gann
Thanks for the quick reply. You can contact me directly at "nagle@downside.com".
Visiting http://www.downside.com will show what I'm doing with this info. It could get Grok some publicity; Downside gets a substantial number of hits.
Right now, I'm working on the code that extracts tables from the SEC database (http://www.sec.gov), and finds rows and columns. Once I have that done, I'll have text to feed into Grok.