#352 biblStruct for Patent citations

AMBER
closed-fixed
Kevin Hawkins
5
2014-11-12
2012-03-30
Javier Pose
No

We are implementing a project for encoding our patent and non-patent literature according to the TEI standard.

For doing so, we would need to have a very precise bibliographical reference (very important in the patent literature) of patent documents.
The current TEI standard does not allow us to encode the patent bibliographical citations. In the patents, the patent documents are cited according to a very well defined encoding for which the main elements are:

- Identification of a Patent Authority
- Identification of a Patent Number
- Identification of a Date
- Identification of a Kind Code

Therefore, we would need to have the following structure in TEI for encoding the bibliographic information of patents:

<biblStruct type="patent¦utilityModel¦designPatent¦plant" status="application¦publication">
<monogr>
<authority>
<orgName type="national¦regional"><orgName>
</authority>
<idno type="docNumber"></idno>
<date type=""applicationDate¦publicationDate"></date>
<imprint>
<idno type="kindCode"></idno>
</imprint>
</monogr>
</biblStruct>

I would like to add some examples, to show the importance of having this structure for our project:

1) Normally in the patent documents, the citation of other patents is one of the most important information.
This citation could identify the priority patents, related patents or simply are patents cited in the document.
The bibliographical reference to these patents is done without indicating any title, but using the patent standard bibliographical codification.
See the following examples (I attached a file with the corresponding images)

E1) In this text (from a patent) another patent is cited by: "Japanese Patent Laid-Open No. 223883/1974
E2) in this example you can see how normally the bibliographical information of the patents is provided:
E3) also non-patent literature uses very often this kind of citation, see the following example:

2) I would like also to indicate that there are different citation manual styles which explicitly avoid to use the title and other information to cite the patents:

Bluebook Citation:
U.S. Patent No. 6,885,550 (issued Apr. 26, 2005).

APA Citation:
Williams, D. (2005). U.S. Patent No. 6,885,550. Washington, DC: U.S.

ACS Citation:
Williams, D. U.S. Patent 6,885,550, 2005.

Discussion

1 2 3 > >> (Page 1 of 3)
  • Javier Pose
    Javier Pose
    2012-03-30

    Patent citation Examples

     
  • Laurent Romary
    Laurent Romary
    2012-04-15

    I see this as quite convincing. Would it make sense, once the change in biblStruct is made to add such a patent citation example in the guidelines. Would be good to show the variety of applications....

     
  • James Cummings
    James Cummings
    2012-06-29

    • assigned_to: nobody --> kshawkin
     
  • Kevin Hawkins
    Kevin Hawkins
    2012-08-06

    Thank you, Javier, for providing so much background information.

    Using the encoding structure you propose, it appears that the following changes would need to be made to the P5 content models:

    a) Add @status to <biblStruct> by way of adding this element to the att.docStatus class.

    b) Allow <authority> as a child of <monogr>.

    In addition, we would add some examples of patent citations to section 3.11 of the Guidelines.

    While you give various possible values of @type and @status on various elements, I think you'll agree that we shouldn't limit the values on these elements since these elements can be used for other things.

    Javier, does this all sound right?

    I see no reason not to implement this. If other members of Council agree with this, I suggest we do two things:

    1. One of the Council members can implement the changes to the content models.

    2. We ask Javier to provide suggested changes to the prose of section 3.11.

     
  • Javier Pose
    Javier Pose
    2012-08-16

    Hi,
    many thanks for your answer.
    Basically the changes that you enumerate are right but there are also two additional small changes:

    1) Currently, the element <idno> is only allowed inside the <monogr> IF it goes after the element <title>.
    This seems to be an arbitrary restriction. In the case of patents, most of the times the bibliographic citation does not have the title of the patent, so it should also be alloed to have the element <idno> inside <monogr> without restrictions.
    2) It would also be needed to allow the element <idno> inside <imprint> in order to encode the patent code.

     
  • Kevin Hawkins
    Kevin Hawkins
    2012-08-29

    Pardon the delay in getting back to this. I completely missed that your sample was suggesting <idno> in two separate places that it is not currently allowed. However:

    1) How would the semantics of biblStruct/monogr/idno be different from biblStruct/idno? That is, why exactly did you want to make <idno> a child of <monogr> rather than a sibling for the patent number? I am quite reluctant to allow this element in two places for fear of causing the kind of confusion we have for <biblScope> in <biblStruct> (see http://purl.org/TEI/FR/3555190 in case you're interested).

    2) Why do you want to put the patent code inside <imprint> instead of as a sibling of the other <idno>? You could distinguish the two with @type.

     
  • Javier Pose
    Javier Pose
    2012-08-30

    Hi,
    many thanks for your comments.

    I will try to explain the reasons why we propose the to have the information of the patent kindCode (<idno type="kindCode"></idno>) in the element <imprint>:

    The element <imprint> groups information relating to the publication or distribution of a bibliographic item.

    This is exactly what the kindCode of a patent document means because it informs about the publication or distribution of the patent (!).

    Basically a patent is identify with a set of metadata (basically a patent authority and patent number) and the patent publication is further characterized by the additional "kindCode". Therefore, the kindCode provides the information relating to the publication or distribution of the patent (for example if it is a patent published after a search of the patent examiner, or patent published during the examination procedure...).
    A patent, during its life cycle, is published "physically" several times, each version corresponding to additional corrections and refinements. It appears thus appropriate to put the identifier corresponding to the publication under the element grouping information related to publication of the bibliographical item, so imprint. One can refer to a patent, or to a particular patent publication.

    Regarding your two questions I will now answer them, the arguments being basically already provided in the explanation above:

    Regarding (1), in our proposal, the idno corresponding to the patent number (<idno type="docNumber">...</idno>) specifies the patent as a separate stand-alone bibliographical entity, which correspond to an independent item which can be cited as such. This sort of bibliographic information is normally grouped in the monogr section together with information like inventors, similarly as a book or a report. The patent number is actually relatively similar to a volume of a serial publication, the serie being the granting patent authority (e.g. patent publication 000001 from the USPTO). It does not appear to us consistent to put the a similarly semantic idno under biblStruct for patent, and under monogr for a book.

    Regarding (2), the TEI indicates that <imprint> groups information relating to the publication or distribution of a bibliographic item. As explained before, a patent is identify with a set of metadata, and a patent publication by the additional "kind code". A patent, during its life cycle, is published "physically" several times, each version corresponding to additional corrections and refinements. It appears thus appropriate to put the identifier corresponding to the publication under the element grouping information related to publication of the bibliographical item, so imprint. One can refer to a patent, or to a particular patent publication.

     
  • Kevin Hawkins
    Kevin Hawkins
    2012-09-02

    I see what you are saying about (1). Now that I look over section 3.11.1 of the Guidelines, I am reminded that even in a simple <biblStruct>, usually all components of the citation are wrapped in <monogr>. While a few elements are allowed outside of <analytic>, <monogr>, and <series>, these appear to be for exceptional purposes where the information outside of these elements refers to more than one of them. So I now agree that it makes sense for a patent number to be inside <monogr>. And since you have patent citations that lack titles, we should no longer require <title> inside <monogr> to support this usage.

    Regarding (2), thank you for the explanation of what a patent kindCode is. You hadn't actually explained it before, and nothing about the term "kind code" indicated to me that it relates to the publication or distribution of the patent document. (I would have guessed that a kind of patent is a classification along the lines of "physical device", "business process", etc.) However, from your description, it sounds like such a code isn't really an "identifier used to identify some object" (from the definition of <idno>); rather, it's akin to how <term> is used within <keywords>, no? That is, I actually think it might make more sense to use <term> for (2). What do you think?

    So at this point I am prepared to support the following four changes to P5 content models to support citations of patents:

    a) Add @status to <biblStruct> by way of adding this element to the
    att.docStatus class.

    b) Allow <authority> as a child of <monogr>.

    c) No longer require <title> inside <monogr>.

    d) Allow <term> as a child of <imprint>.

    As before, if other members of Council agree
    with these changefs, I suggest we do two things:

    1. One of the Council members can implement the changes to the content models.

    2. We ask Javier to provide suggested changes to the prose of section 3.11 and examples of patent citations illustrating (a) through (d) above.

     
  • Kevin Hawkins
    Kevin Hawkins
    2012-09-02

    Javier, if it's easier to discuss by Skype, I'd be happy to do that. My user is kshawkin. I could speak Wednesday or later.

     
  • Lou Burnard
    Lou Burnard
    2012-09-16

    On a quick reading of this proposal, I am rather appalled by the suggestion that <title> should become optional. Any bibliographic entry must have a title, surely? Even in the abbreviated references above, would it be wrong to regard the title as (e.g.) "Us Patent No xxxxx" ?

     
  • Lou Burnard
    Lou Burnard
    2012-09-16

    • milestone: --> AMBER
     
  • Kevin Hawkins
    Kevin Hawkins
    2012-09-18

    Javier demonstrates that it is common practice to cite patents without reference to a title of the patent. While one might call this an "abbreviated reference", I don't see why the TEI should prevent someone from using biblStruct to record such a citation which is otherwise structured. We allow a <pubPlace> and <publisher> to be omitted from a citation in the case of, say, a journal article, in which these are not typically given. Why not do the same for patents?

    For what it's worth, while Lou suggests regarding the title as "Us Patent No xxxxx", but looking at the Word document attached to this ticket, I see that E2's actual title is more likely to be "System and Method for Natural Language Processing and Using Ontological Searches".

     
  • Javier Pose
    Javier Pose
    2012-09-18

    Hi Kevin,
    sorry for the delay to reply.
    I was these two last weeks on holidays (...in fact just married ;-)
    Could it be possible to have a Skype next week (Friday or weekend)?

     
  • Kevin Hawkins
    Kevin Hawkins
    2012-09-21

    Javier: I've been at the TEI Council meeting most of this week and will be traveling this weekend as well. Please email me at kevin.s.hawkins@ultraslavonic.info with some suggested times (and your time zone) so we can figure out a time that might work on Tues., Sept. 25th, or later.

     
  • Kevin Hawkins
    Kevin Hawkins
    2012-09-30

    For the record, the Technical Council decided at its September 2012 meeting in Oxford to no longer allow <idno> as a child of <biblStruct>: see http://purl.org/tei/fr/3565878 . (In the discussion below, we have already agreed that we would like to put <idno> inside of <monogr>, so this doesn't affect us.)

     
  • Kevin Hawkins
    Kevin Hawkins
    2012-09-30

    Javier and I discussed by Skype. Aside from Lou's objection to omission of a <title> in a TEI-encoded citation (which both Javier and Kevin feel there is a good use case for), the only outstanding question is whether to how to encode the "kind code". This contains a coding used by a particular patent office to note the status of a document in the application and publication process. The codes vary between patent offices. At the European Patent Office, there are four or five codes that apply to patent applications and four or five which apply to patent publications. (Javier will provide the kind codes used by the European Patent Office in a comment on this ticket to make this discussion more concrete.) Patents are sometimes cited as an application or publication (the value of biblStruct@status) without reference to the kind code, so we can't simply put the kind code in biblStruct@status.

    Javier feels that the kind code relates to the publication of the patent and therefore belongs inside <imprint>. Kevin suggested imprint@status, but Javier said that kind codes feel to him more like content than an attribute value. If there is a use case where you might want to use markup within a kind code (for example, if you are transcribing patent citations that include kind codes from a source document and want to use <sic> or <corr>), then it definitely need to be in an element.

    (If we definitely agree not to use imprint@status for kind codes, I wonder whether we should use imprint@status for what is currently on biblStruct@status since that also relates to the publication of the patent.)

    Let's say for now that the kind code would be included as the content of some element inside of <imprint>. Do we use <idno>, as Javier suggests, or <term>, as Kevin suggests? Javier suggests <idno> because the element definition says "supplies any form of identifier used to identify some object, such as a bibliographic item, a person, a title, an organization, etc. in a standardized way." Kevin explained in reply that he's only ever known an <idno> used in a way in which the content of this element identifies a *single* entity, whereas kind codes are actually standardized terms from a typology which don't identify any particular entity. For that, Kevin feels that <term> is the usual way this is done in TEI (at least when they occur inside <keywords> in the header).

     
  • Javier Pose
    Javier Pose
    2012-09-30

    Regarding the proposal of adding the kindCode as an attribute of the <imprint> I think it is not the best place for the following reasons:

    1) Most patent authorities issue more than one document for any particular patent.
    These sequential documents often keep the same number, so they are distinguished by adding a letter immediately after the number, called the kind code.
    Therefore, the kindCode is an element of the four elements used for identifying a patent document (Patent Authority + Patent Number + Date + Kind Code). If the kindCode is stored as an attribute of <imprint>, then <imprint> would be an empty element with the information of the kindCode stored as an attribute. This seems to be somehow strange, because we would have an empty <imprint> (which is not allowed currently by TEI guidelines).

    2) Even allowing empty <imprint> the fact of storing the kind code as attribute would restrict possible functionalities like the following: in most of the offices the kind code is composed by one letter and one number (for example A2, C1,... For more detailed information see http://www.delphion.com/help/kindcodes\). The letter and the number carry information about a particular aspect of the current situation of the document, so it could be convenient to encode the kind code as two separate items, the letter and the number. If the kind code is stored as a whole as an attribute, this won’t be possible, but if it is stored as child of <imprint>, then it would be possible to "fine-grain" encode separately the letter from the number composing the whole kind code.
    According to the XML common practice, If the information is expressed in a structured form, especially if the structure may be extensible, the elements should be used. On the other hand: If the information is expressed as an atomic token, the attribute could be used (see for example http://www.ibm.com/developerworks/xml/library/x-eleatt/index.html\). In this case, the kind code IS NOT an atomic information, but it is composed in most of cases of one letter and one number which have specific meanings. Therefore, it seems to be more appropriate to store this information as an element, i.e. a child of <imprint>.

    For these reasons, I would store the kind code information as a child of the <imprint> element.

    P.S.: I attach a file with a brief explanation of the kind codes at the European Patent Office. The extended information of the different kind codes in other patent authorities can be found in http://www.delphion.com/help/kindcodes

     
  • Javier Pose
    Javier Pose
    2012-09-30

    EPO kind code

     
  • Kevin Hawkins
    Kevin Hawkins
    2012-09-30

    Now that I seem some examples of kind codes, I believe that we should use <classCode> or <catRef>, not <term> (and still not <idno>), for these. See section 2.4.3 ( http://www.tei-c.org/release/doc/tei-p5-doc/en/html/HD.html#HD43 ), especially, the last paragraph, for an explanation of these two elements. As in the last paragraph of section 2.4.3, either could be used by the encoder for a kind code depending on whether the kind codes used are from an open-ended system and whether they are documented in the header.

    Furthermore, now that I think about it, since <imprint> will not be an empty element, I think we should put status="application¦publication" on <imprint>, not on <biblStruct>, since it relates to "the publication or distribution of a bibliographic item".

    So my proposal is:

    a) Add @status to <imprint> by way of adding this element to the
    att.docStatus class.

    b) Allow <authority> as a child of <monogr>.

    c) No longer require <title> inside <monogr>.

    d) Allow <classCode> and <catRef> as a child of <imprint>.

    If Council approves, I suggest that:

    1. We ask Javier to provide suggested changes to the prose of section 3.11
    and examples of patent citations illustrating (a) through (d) above.

    2. One of the Council members can implement the changes to the content
    models.

     
  • Kevin Hawkins
    Kevin Hawkins
    2012-09-30

    I just noticed that Javier's original example included

    <date type=""applicationDate¦publicationDate"></date>

    as a child of <biblStruct>, not as a child of <imprint>. Currently this is not allowed in TEI, and I think that we should just put this inside <imprint>, which is for information related to publication and distribution. I think that just because something is still unpublished (only an application) you can and should still use <imprint> for the equivalent information.

    Here's a summary of what is being suggested:

    <biblStruct type="patent¦utilityModel¦designPatent¦plant">
    <monogr>
    <authority>
    <orgName type="national¦regional">[name of patent office goes here]<orgName>
    </authority>
    <idno type="docNumber">[document number goes here]</idno>
    <imprint status="application¦publication">
    <classCode>[kind code goes here]</classCode> <== Note that you could also use <catRef> here!
    <date type=""applicationDate¦publicationDate">[date goes here]</date>
    </imprint>
    </monogr>
    </biblStruct>

     
  • Javier Pose
    Javier Pose
    2012-10-01

    1) Regarding the idea of storing the <date> as child of <imprint> I agree with Kevin and I think it is a good idea. It is important to note that this date refers to the date of filing/publication of the patent, and it is not related to the specific stage of the patent. The application-date is the date when a complete application was received and the publication-date is the date on which the patent application is published (i.e. the information is available to public) normally 18 months after filing or 18 months after priority date.
    Since the <imprint> "groups information relating to the publication or distribution of a bibliographic item, it seems a good idea to store it there.

    2) Regarding the idea of using <classCode>for storing the kindCode, I also agree with Kevin.

    3) Regarding the proposal of having status="application¦publication" on
    <imprint>, I think it is not a good idea, because there are some bibliographic references to patents WITHOUT the kind Code and date, what would lead us to an empty <imprint>. If we store the status in an attribute of the <imprint>, then we will have a problem for those references which do not have a kind
    Code, because it would be an empty <imprint>. Furthermore, the status (application¦publication) is a property affecting of the whole bibliographic reference and not only of part of it.
    Therefore, I still think that the status should be an attribute of <bibliStruct>.

    So my proposal (which is the same as Kevin except for the @status), would be:

    <biblStruct type="patent¦utilityModel¦designPatent¦plant" status="application¦publication">
    <monogr>
    <authority>
    <orgName type="national¦regional">[name of patent office goes
    here]<orgName>
    </authority>
    <idno type="docNumber">[document number goes here]</idno>
    <imprint>
    <classCode>[kind code goes here]</classCode> <== Note that you could
    also use <catRef> here!
    <date type=""applicationDate¦publicationDate">[date goes
    here]</date>
    </imprint>
    </monogr>
    </biblStruct>

     
  • Lou Burnard
    Lou Burnard
    2012-10-01

    Some quick comments from Lou:
    1. I agree that adding @status on the <biblStruct> makes more sense than adding it on the <imprint> only
    2. I still think a <title> should be present, even if it's just boiler plate. Or why not use <title> in preference to <monogr> as a means of wrapping some of the other parts <title><orgName>...</orgName><idno>...</idno> <date>...</date></title>?
    3. <classCode> and <catRef> are not exactly the same. You can only use the latter if youve got a <classDecl> somewhere to point at.

     
  • Kevin Hawkins
    Kevin Hawkins
    2012-10-02

    I still prefer imprint@status instead of biblStruct@status because it is the status of publication, not the status of the citation, that we are noting with this attribute. Furthermore, it's odd that you would put <date type="applicationDate"> and/or <date type="publicaitonDate"> in <imprint> and yet put status="application" or status="publication" on <biblStruct> rather than <imprint>. In my opinion we should not let the fact that an <imprint> element would be empty stop us if the encoding would be well structured.

    That said, I could live with leaving @status on <biblStruct>, especially in order to support other uses of this attribute besides for recording applilcation versus publication. If we put it on <biblStruct>, I think we should also do it on <bibl> and <biblFull>.

    Lou's suggestion of

    <title><orgName>...</orgName><idno>...</idno> <date>...</date></title>

    assumes that when reading a citation such as:

    U.S. Patent No. 6,885,550 (issued Apr. 26, 2005).

    one should understand the title to be "U.S. Patent No. 6,885,550 (issued Apr. 26, 2005)". That's absurd. As I mentioned in a previous comment, and as can be seen in "PatentCitationExamples.doc", patent documents have actual titles, but no one cites them that way. What's the point of having a placeholder title in the data? If a style guide for some citation format requires a title for each citation, then the person writing the stylesheet for converting <biblStruct>s to that citation format can construct the required title however it should be created. Compare this with a citation that lacks a place of publication or publisher: most of us would omit <pubPlace> or <publisher>, and the output would insert "s.l.", "s.n.", "n.d.", "[no date]", etc. in place of the place or publisher.

     
  • Javier Pose
    Javier Pose
    2012-10-02

    Regarding the comments about the suggestion of Lou about <title>, I totally agree with Kevin. The patents have in fact their own titles, and putting the patent reference (patent authority + patent number + kind code + date) as its title (<title><orgName>...</orgName><idno>...</idno> <date>...</date></title>) would be totally confusing and not accepted in the patent community (!!!).

     
  • Kevin Hawkins
    Kevin Hawkins
    2012-10-22

    in order to allow for this structure:

    <biblStruct type="patent¦utilityModel¦designPatent¦plant"
    status="application¦publication">
    <monogr>
    <authority>
    <orgName type="national¦regional">[name of patent office goes
    here]<orgName>
    </authority>
    <idno type="docNumber">[document number goes here]</idno>
    <imprint>
    <classCode>[kind code goes here]</classCode> <== Note that you could
    also use <catRef> here!
    <date type=""applicationDate¦publicationDate">[date goes
    here]</date>
    </imprint>
    </monogr>
    </biblStruct>

    In order to help the EPO claim full TEI compliance sooner than later, have made the following schema changes now ( http://tei.svn.sourceforge.net/viewvc/tei?view=revision&revision=10992 ):

    a) Add @status to <biblStruct>, <bibl>, and <biblFull> by way of adding these elements to the att.docStatus class.

    b) Allow <authority> as a child of <monogr> (by creating a third version of the complicated first half of the content model, which requires this followed by <idno>). I have also loosened the definition of <authority> to no longer refer just to an "electronic file".

    c) No longer require <title> inside <monogr> (by creating a third version of the complicated first half of the content model, which requires <authority> followed by <idno>)

    d) Allow <classCode> and <catRef> as optional children of <imprint> (before model.dateLike).

    I will email Javier asking him to provide suggested changes to the prose of section 3.11 and examples of patent citations illustrating (a) through (d) above, which I will incorporate into the Guidelines once I receive them. Since we will have a release soon, the schema and prose changes are unlikely to happen in the same release.

     
1 2 3 > >> (Page 1 of 3)