Currently, punctuation marks are always displayed with spaces before-and-after in concordance and extended-context view (because they are indexed as separate tokens).
It would be possible, however, to add functionality to have punctuation marks display without the bounding spaces. There are two possible ways to do this:
(1) the way BNCweb does it - by having separate p-attributes encoding whether or not there is a space adjacent. This increases the complexity of setup and requires more disk space.
(2) an alternative which would use less disk space but would make concordance rendering take (milliseconds) longer - allow a regex to be specified and make the addition of an intervening space conditional on the regex being (not) matched. For example, \W+ could be used for English, so that any token made up solely of punctuation marks would be attached to the token preceding. This would allow the visualisation to be correct in most but not all cases. (e.g. quote marks). Separate regexen could control space before after; this would also allow Chinese data to be rendered without any intervening spaces, for instance, by setting the regex to .+.
This is currently not a high priority feature as it is merely cosmetic.
Log in to post a comment.