Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

#352 biblStruct for Patent citations

AMBER
closed-fixed
Kevin Hawkins
5
2013-04-12
2012-03-30
Javier Pose
No

We are implementing a project for encoding our patent and non-patent literature according to the TEI standard.

For doing so, we would need to have a very precise bibliographical reference (very important in the patent literature) of patent documents.
The current TEI standard does not allow us to encode the patent bibliographical citations. In the patents, the patent documents are cited according to a very well defined encoding for which the main elements are:

- Identification of a Patent Authority
- Identification of a Patent Number
- Identification of a Date
- Identification of a Kind Code

Therefore, we would need to have the following structure in TEI for encoding the bibliographic information of patents:

<biblStruct type="patent¦utilityModel¦designPatent¦plant" status="application¦publication">
<monogr>
<authority>
<orgName type="national¦regional"><orgName>
</authority>
<idno type="docNumber"></idno>
<date type=""applicationDate¦publicationDate"></date>
<imprint>
<idno type="kindCode"></idno>
</imprint>
</monogr>
</biblStruct>

I would like to add some examples, to show the importance of having this structure for our project:

1) Normally in the patent documents, the citation of other patents is one of the most important information.
This citation could identify the priority patents, related patents or simply are patents cited in the document.
The bibliographical reference to these patents is done without indicating any title, but using the patent standard bibliographical codification.
See the following examples (I attached a file with the corresponding images)

E1) In this text (from a patent) another patent is cited by: "Japanese Patent Laid-Open No. 223883/1974
E2) in this example you can see how normally the bibliographical information of the patents is provided:
E3) also non-patent literature uses very often this kind of citation, see the following example:

2) I would like also to indicate that there are different citation manual styles which explicitly avoid to use the title and other information to cite the patents:

Bluebook Citation:
U.S. Patent No. 6,885,550 (issued Apr. 26, 2005).

APA Citation:
Williams, D. (2005). U.S. Patent No. 6,885,550. Washington, DC: U.S.

ACS Citation:
Williams, D. U.S. Patent 6,885,550, 2005.

Discussion

<< < 1 2 3 4 .. 6 > >> (Page 2 of 6)
  • Lou Burnard
    Lou Burnard
    2012-09-16

    • milestone: --> AMBER
     
  • Kevin Hawkins
    Kevin Hawkins
    2012-09-18

    Javier demonstrates that it is common practice to cite patents without reference to a title of the patent. While one might call this an "abbreviated reference", I don't see why the TEI should prevent someone from using biblStruct to record such a citation which is otherwise structured. We allow a <pubPlace> and <publisher> to be omitted from a citation in the case of, say, a journal article, in which these are not typically given. Why not do the same for patents?

    For what it's worth, while Lou suggests regarding the title as "Us Patent No xxxxx", but looking at the Word document attached to this ticket, I see that E2's actual title is more likely to be "System and Method for Natural Language Processing and Using Ontological Searches".

     
  • Javier Pose
    Javier Pose
    2012-09-18

    Hi Kevin,
    sorry for the delay to reply.
    I was these two last weeks on holidays (...in fact just married ;-)
    Could it be possible to have a Skype next week (Friday or weekend)?

     
  • Kevin Hawkins
    Kevin Hawkins
    2012-09-21

    Javier: I've been at the TEI Council meeting most of this week and will be traveling this weekend as well. Please email me at kevin.s.hawkins@ultraslavonic.info with some suggested times (and your time zone) so we can figure out a time that might work on Tues., Sept. 25th, or later.

     
  • Kevin Hawkins
    Kevin Hawkins
    2012-09-30

    For the record, the Technical Council decided at its September 2012 meeting in Oxford to no longer allow <idno> as a child of <biblStruct>: see http://purl.org/tei/fr/3565878 . (In the discussion below, we have already agreed that we would like to put <idno> inside of <monogr>, so this doesn't affect us.)

     
  • Kevin Hawkins
    Kevin Hawkins
    2012-09-30

    Javier and I discussed by Skype. Aside from Lou's objection to omission of a <title> in a TEI-encoded citation (which both Javier and Kevin feel there is a good use case for), the only outstanding question is whether to how to encode the "kind code". This contains a coding used by a particular patent office to note the status of a document in the application and publication process. The codes vary between patent offices. At the European Patent Office, there are four or five codes that apply to patent applications and four or five which apply to patent publications. (Javier will provide the kind codes used by the European Patent Office in a comment on this ticket to make this discussion more concrete.) Patents are sometimes cited as an application or publication (the value of biblStruct@status) without reference to the kind code, so we can't simply put the kind code in biblStruct@status.

    Javier feels that the kind code relates to the publication of the patent and therefore belongs inside <imprint>. Kevin suggested imprint@status, but Javier said that kind codes feel to him more like content than an attribute value. If there is a use case where you might want to use markup within a kind code (for example, if you are transcribing patent citations that include kind codes from a source document and want to use <sic> or <corr>), then it definitely need to be in an element.

    (If we definitely agree not to use imprint@status for kind codes, I wonder whether we should use imprint@status for what is currently on biblStruct@status since that also relates to the publication of the patent.)

    Let's say for now that the kind code would be included as the content of some element inside of <imprint>. Do we use <idno>, as Javier suggests, or <term>, as Kevin suggests? Javier suggests <idno> because the element definition says "supplies any form of identifier used to identify some object, such as a bibliographic item, a person, a title, an organization, etc. in a standardized way." Kevin explained in reply that he's only ever known an <idno> used in a way in which the content of this element identifies a *single* entity, whereas kind codes are actually standardized terms from a typology which don't identify any particular entity. For that, Kevin feels that <term> is the usual way this is done in TEI (at least when they occur inside <keywords> in the header).

     
  • Javier Pose
    Javier Pose
    2012-09-30

    Regarding the proposal of adding the kindCode as an attribute of the <imprint> I think it is not the best place for the following reasons:

    1) Most patent authorities issue more than one document for any particular patent.
    These sequential documents often keep the same number, so they are distinguished by adding a letter immediately after the number, called the kind code.
    Therefore, the kindCode is an element of the four elements used for identifying a patent document (Patent Authority + Patent Number + Date + Kind Code). If the kindCode is stored as an attribute of <imprint>, then <imprint> would be an empty element with the information of the kindCode stored as an attribute. This seems to be somehow strange, because we would have an empty <imprint> (which is not allowed currently by TEI guidelines).

    2) Even allowing empty <imprint> the fact of storing the kind code as attribute would restrict possible functionalities like the following: in most of the offices the kind code is composed by one letter and one number (for example A2, C1,... For more detailed information see http://www.delphion.com/help/kindcodes\). The letter and the number carry information about a particular aspect of the current situation of the document, so it could be convenient to encode the kind code as two separate items, the letter and the number. If the kind code is stored as a whole as an attribute, this won’t be possible, but if it is stored as child of <imprint>, then it would be possible to "fine-grain" encode separately the letter from the number composing the whole kind code.
    According to the XML common practice, If the information is expressed in a structured form, especially if the structure may be extensible, the elements should be used. On the other hand: If the information is expressed as an atomic token, the attribute could be used (see for example http://www.ibm.com/developerworks/xml/library/x-eleatt/index.html\). In this case, the kind code IS NOT an atomic information, but it is composed in most of cases of one letter and one number which have specific meanings. Therefore, it seems to be more appropriate to store this information as an element, i.e. a child of <imprint>.

    For these reasons, I would store the kind code information as a child of the <imprint> element.

    P.S.: I attach a file with a brief explanation of the kind codes at the European Patent Office. The extended information of the different kind codes in other patent authorities can be found in http://www.delphion.com/help/kindcodes

     
  • Javier Pose
    Javier Pose
    2012-09-30

    EPO kind code

     
  • Kevin Hawkins
    Kevin Hawkins
    2012-09-30

    Now that I seem some examples of kind codes, I believe that we should use <classCode> or <catRef>, not <term> (and still not <idno>), for these. See section 2.4.3 ( http://www.tei-c.org/release/doc/tei-p5-doc/en/html/HD.html#HD43 ), especially, the last paragraph, for an explanation of these two elements. As in the last paragraph of section 2.4.3, either could be used by the encoder for a kind code depending on whether the kind codes used are from an open-ended system and whether they are documented in the header.

    Furthermore, now that I think about it, since <imprint> will not be an empty element, I think we should put status="application¦publication" on <imprint>, not on <biblStruct>, since it relates to "the publication or distribution of a bibliographic item".

    So my proposal is:

    a) Add @status to <imprint> by way of adding this element to the
    att.docStatus class.

    b) Allow <authority> as a child of <monogr>.

    c) No longer require <title> inside <monogr>.

    d) Allow <classCode> and <catRef> as a child of <imprint>.

    If Council approves, I suggest that:

    1. We ask Javier to provide suggested changes to the prose of section 3.11
    and examples of patent citations illustrating (a) through (d) above.

    2. One of the Council members can implement the changes to the content
    models.

     
  • Kevin Hawkins
    Kevin Hawkins
    2012-09-30

    I just noticed that Javier's original example included

    <date type=""applicationDate¦publicationDate"></date>

    as a child of <biblStruct>, not as a child of <imprint>. Currently this is not allowed in TEI, and I think that we should just put this inside <imprint>, which is for information related to publication and distribution. I think that just because something is still unpublished (only an application) you can and should still use <imprint> for the equivalent information.

    Here's a summary of what is being suggested:

    <biblStruct type="patent¦utilityModel¦designPatent¦plant">
    <monogr>
    <authority>
    <orgName type="national¦regional">[name of patent office goes here]<orgName>
    </authority>
    <idno type="docNumber">[document number goes here]</idno>
    <imprint status="application¦publication">
    <classCode>[kind code goes here]</classCode> <== Note that you could also use <catRef> here!
    <date type=""applicationDate¦publicationDate">[date goes here]</date>
    </imprint>
    </monogr>
    </biblStruct>

     
<< < 1 2 3 4 .. 6 > >> (Page 2 of 6)