Menu

Tagging financial row definitions

Help
John Nagle
2001-01-03
2001-01-03
  • John Nagle

    John Nagle - 2001-01-03

    I'm interested in using "grok" to help interpret the tags on the rows of financial statements.  Typically, these are not sentences, but noun clauses.  Some examples:

    ASSETS:
    Cash and cash equivalents
    Marketable securities
    Total cash, cash equivalents and marketable securities
    Accounts receivable - trade and other
    Inventories
    Prepaid employee benefits, taxes and other expenses
    Finance receivables and retained interests in sold receivables
    Property and equipment
    Special tools
    Intangible assets
    Other noncurrent assets

    LIABILITIES:
    Accounts payable
    Accrued liabilities and expenses
    Short-term debt
    Payments due within one year on long-term debt
    Long-term debt
    Accrued noncurrent employee benefits
    Other noncurrent liabilities

    If I could get a sentence diagram out, in which I could then look for stock phrases and identify subordinate clauses to them, that would be sufficient.  What I need, in a LISP-like notation,
    is parsing into something like

    (and (cash) (cash (equivalents)))
    (securities (marketable))
    (total (cash) (cash (equivalents)) (securities (marketable)))
    (accounts (receivable (and (trade other)))

    Is this something one can reasonably do with GROK?
    Thanks.

     
    • Gann Bierner

      Gann Bierner - 2001-01-03

      Well, yes and no.

      The Grok parser will certainly parse these noun phrases and produce something resembling the structures you describe.  The problem is that, in general, correctly parsing nps is really hard because you need a lot of world knowledge to know what the attachments are.

      The good news is that, I believe, Grok has a category tagger trained off of wall street journal (ie. financial) text, so it might actually do a decent job in your case.   Jason is the person to ask about this, and I believe that he is writing some code to make the parser simpler to use.  Jason?

      Gann

       
    • John Nagle

      John Nagle - 2001-01-03

      Thanks for the quick reply.  You can contact me directly at "nagle@downside.com".
      Visiting http://www.downside.com will show what I'm doing with this info.  It could get Grok some publicity; Downside gets a substantial number of hits.

      Right now, I'm working on the code that extracts tables from the SEC database (http://www.sec.gov), and finds rows and columns.  Once I have that done, I'll have text to feed into Grok. 

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.