Make the GATE API allow to specify a known MIME type
when creating a document, instead of determining the
type automatically.
This is important in many situations, but mostly when
importing from files or from strings:
in those cases it is easily possible that the type
determined automatically can be wrong.
There are probably several ways how this could be done,
but I'd think that an additional
initialization parameter (e.g. "ForceMimeType") would
be most consistent.
Then, the DocumentImpl.init() function could, in lines
200ff, check if a mime type has been
set like this and use
"DocumentFormat.getDocumentFormat(this, forceMimeType);" to
determine the document format.
Logged In: YES
user_id=1157323
I have done pretty much exactly what you suggest, except that I've called the
parameter simply "mimeType". If specified, the mimeType parameter is used to
select a DocumentFormat and the usual heuristics are not used. Of course, if no
explicit mimeType is provided it continues to function as before, determining
the format from the extension, web server mime type and magic numbers.