Menu

CollationAPI

Eric Smith Domhnall101

Draft 2015-06-16

Collation API

Invokes the CollateX engine to generate a comparison of the specified witnesses. CollateX is invoked with the Needleman-Wunsch algorithm. The details of the collation can be controlled using the parameters described here:

parameter value effect
comparison plain Witnesses are compared using just their text content.
comparison morph Witnesses are compared based on their morphological content, using analysis performed by the Perseus morphology engine.
comparison orth Witnesses are compared in an orthographically-aware manner, taking into account common spelling irregularities. The dict parameter can be specified, to indicate the spelling dictionary to be used.
dict lat Specifies a spelling dictionary to be used for orth comparison. May be specified multiple times to use multiple spelling dictionaries.
ignoreCase true Upper/lower case differences are ignored.
ignoreCase false Upper/lower case differences are considered significant.
ignoreLineBreaks true The collator ignores line-breaks and hyphenation.
ignoreLineBreaks hyphens The collator ignores line-breaks which immediately follow a hyphen.
ignoreLineBreaks false The collator considers line-breaks as ordinary white-space.
ignorePunctuation true Punctuation differences are ignored.
ignorePunctuation false Punctuation differences are considered significant.
useTEITags true Embedded TEI editorial tags (e.g those from §3.4 Simple Editorial Changes) are applied to the content before making the comparison.
useTEITags false No special treatment for embedded TEI tags.

Full-edition collation

Endpoint: /collation/edID
Method: POST
Request body: none
Request parameters:
comparison=plain|morph|orth (default plain)
dict=dictionary (default lat; may be specified multiple times)
ignoreCase=true|false (default false)
ignoreLineBreaks=true|hyphens|false (default false)
ignorePunctuation=true|false (default false)
useTEITags=true|false (default false)
deferred=true|false (default false)
Response content-type: application/json
Response location: if deferred=true, /deliverable/delID
Response body: an array of arrays of mote annotations (empty if deferred=true):
[
   [
      {"content":"Lectio 1, Prologus","endOffset":18,"target":"#2","endPage":2000,"startPage":2000,"startOffset":0,"type":"tr-mote"},
      {"content":"Lectio 1, Prologus","endOffset":18,"target":"#1","endPage":1000,"startPage":1000,"startOffset":0,"type":"tr-mote"}
   ],
   [
      {"content":"[Reims","endOffset":25,"target":"#1","endPage":1000,"startPage":1000,"startOffset":19,"type":"tr-mote"}
   ],
   [
      {"content":"[Sorbonne","endOffset":28,"target":"#2","endPage":2000,"startPage":2000,"startOffset":19,"type":"tr-mote"}
   ],
   …
]
Status: 200 on success if deferred=false or 303 if deferred=true, 404 if edID does not exist.
Comments: This provides a full collation of all the witnesses in the specified edition. The response contains an array of arrays of mote annotations. Each of the subarrays contains mote annotations which the collator has determined to be in correspondence. The target field of the mote annotations is used to indicate the witness whose content provides support for the annotation. If deferred=true is supplied on the parameter list, the response body will be empty, but the location header will return the URI where the results of the collation will be available.

Sub-collation

Endpoint: /collation
Method: POST
Request content-type: application/json
Request body: an array of text ranges:
[
   {"startPage":1000,"startOffset":0,"endPage":1000,"endOffset":477},
   {"startPage":2000,"startOffset":0,"endPage":2000,"endOffset":360}
]
Request parameters:
comparison=text|morph|orth (default text)
dict=dictionary (default lat; may be specified multiple times)
ignoreCase=true|false (default false)
ignoreLineBreaks=true|hyphens|false (default false)
ignorePunctuation=true|false (default false)
useTEITags=true|false (default false)
deferred=true|false (default false)
Response content-type: application/json
Response location: if deferred=true, /deliverable/delID
Response body: an array of arrays of mote annotations (empty if deferred=true):
[
   [
      {"content":"Lectio 1, Prologus","endOffset":18,"target":"#2","endPage":2000,"startPage":2000,"startOffset":0,"type":"tr-mote"},
      {"content":"Lectio 1, Prologus","endOffset":18,"target":"#1","endPage":1000,"startPage":1000,"startOffset":0,"type":"tr-mote"}
   ],
   [
      {"content":"[Reims","endOffset":25,"target":"#1","endPage":1000,"startPage":1000,"startOffset":19,"type":"tr-mote"}
   ],
   [
      {"content":"[Sorbonne","endOffset":28,"target":"#2","endPage":2000,"startPage":2000,"startOffset":19,"type":"tr-mote"}
   ],
   …
]
Status: 200 on success if deferred=false or 303 if deferred=true, 400 if the text ranges are not consistent (e.g. a range has a negative length, or refers to pages from two separate witnesses).
Comments: The request body must contain an array of JSON objects specifying the text ranges to be collated. In practice, this may be an array of annotations, but that need not be the case. The response contains an array of arrays of mote annotations. Each of the subarrays contains mote annotations which the collator has determined to be in correspondence. The target field of the mote annotations is used to indicate the witness whose content provides support for the annotation. If deferred=true is supplied on the parameter list, the response body will be empty, but the location header will return the URI where the results of the collation will be available.


Related

Wiki: Technical

MongoDB Logo MongoDB