Hi,
Am 11.10.2011 20:33, schrieb W P Blatchley:
> Hi,
>
> On Mon, 10 Oct 2011 01:30:30 +0200, Benny Baumann <BenBE1987@...>
> wrote:
>> Hi,
>>
>> Am 09.10.2011 11:44, schrieb W P Blatchley:
>>> On Fri, 07 Oct 2011 13:18:14 +0200, Benny Baumann <BenBE1987@...>
>>> wrote:
>>>>> There are some hints on MediaWiki on parameters:
>>>>> http://www.mediawiki.org/wiki/Extension:SyntaxHighlight_GeSHi#Parameters
>>>> Unfortunately those parameters are not ment for the language file, but
>>>> for the plugin to tell different things on how to setup the environment
>>>> with GeSHi. So GeSHi itself only sees the effect of each of these
>>>> (supported) parameters. There's no interface in GeSHi to accept
>>>> language
>>>> specific parameters (for 1.0.x). For 1.1.x there's none either, but you
>>>> can emulate something like I described above.
>>> On the subject of this, I've been experimenting with ways to pass
>>> language-specific metadata in and out of GeSHi 1.1.x.
>>>
>>> At the moment, I have implemented a system whereby the caller calls a
>>> new
>>> method setMetadata() in the main GeSHi object, before calling
>>> parseCode().
>>> The setMetadata() method takes a _reference_ to a user variable, which
>>> could be absolutely anything, but using a keyed array makes the most
>>> sense
>>> to me.
>> I think, a hash (AKA array with key => value pairs) is the most
>> meaningful solution.
> Agreed, that's what I've done so far. But now, in the light of your
> comments below about languages intermixing, I think perhaps we need a hash
> of hashes:
>
> $metadata = array(
> 'html' => array(
> 'add_links' => true,
> 'option_2' => value,
> 'option_3' => value
> ),
> 'css' => array(
> 'add_links' => true,
> 'option_2' => value,
> 'option_3' => value,
> )
> );
>
> since an extension of your argument below is that somebody might want (for
> their own perverted reasons ;) ) to add links from <a> tags in HTML, but
> NOT in css url()s.
Agreed.
> Perhaps there could be a 'default' key first that carried common metadata
> instructions, to simplify use if people didn't want to have to define such
> a complicated hash:
>
> $metadata = array(
> 'default' => array(
> 'add_links' => true
> )
> );
>
> Here, since add links is defined in the 'default' hash, it would be used
> for both html and css, and any other sub-languages that got invoked as a
> result of parsing the html (doxygen for eg).
k. ACK here.
>>> This reference is passed by GeSHi to the codeparser (as a new argument
>>> to
>>> the codeparser constructor), which then hangs on to it so that the
>>> language-specific codeparser can get at it. (Since AFAIK, the codeparser
>>> has no handle on its GeSHi ancestor.)
>> The question is, given your case with algol68 stropping is that it would
>> be most useful to have the meta-data available when loading the language
>> file as this allows the language file to react flexible to different
>> ways the stropping affects the way keywords have to be matched. If you
>> intercept a call this early you can avoid of having too many costy
>> "post-processing" in the code parser as each call there is quite
>> expensive in terms of runtime performance.
> I also wanted to access the metadata from the language file for my BASICV
> implementation, so I added a getMetadata() method to the root GeSHi class.
> However, I think it would be nicer just to pass a reference to the
> metadata directly to the geshi_lang_dialect() function.
And that's where the distincting between metadata (like function names
and labels) and options (rendering styles, language file settings for
stropping, ...) comes in handy.
As long as you separate those two you can easily "pre-merge" the options
part per language to cache that part, while passing the metadata by-ref
when doing actual processing; passing a reference to the proper language
subarray. That way there is no reason to have a pre-merge on the
metadata and thus you avoid some problems like the ones mentioned below
regarding settings.
>>> What form the metadata takes, and how it's used, would be totally
>>> independent of the core GeSHi code, and only a specific language
>>> implementation would and should know anything about it.
>> Well, not quite, as most of the languages will probably intermix with
>> each other. E.g. if you have metadata arguments for HTML to e.g. link
>> URLs of A-tags then you'd probably want CSS to do the same for
>> url()-references too. So if multiple languages, if not all, could agree
>> on a common convention on how those metadata is used this would make it
>> much easier to use. Also there should be a way via the API to query
>> which metadata attributes are supported by a given language, probably
>> with a short description.
> Given the idea of a hash of hashes above, and the possibility of a
> 'default' option that overrides language-specific metadata options, it
> makes the whole thing a little complicated. Every time you wanted to check
> a metadata option, you'd have to write:
>
> $my_option = isset($metadata['default']['my_option']) ?
> $metadata['default']['my_option'] : $metadata[$lang]['my_option'];
>
> or something even more nasty, if you were checking (as you would have to)
> that 'my_option' was even defined anywhere, etc.
>
> So, we'd probably want to wrap all that up in a fetchMetadata() function
> somewhere. Even maybe a metadata class that handles the API to query which
> metadata options the languages support and describing what they do, etc.
Might be a way to go; although I'm not that fond with the separate
metadata class; I doubt it's sufficient complexity to justify this
additional separation.
>>> In my implementation, I use it for example, as follows:
>>>
>>> - My BASICV detokeniser (nothing to do with GeSHi) can figure out and
>>> return the location and names of procedure and function definitions in
>>> the
>>> source.
>>>
>>> - If the detokeniser has returned that info, it can be passed as
>>> &metadata['proclist'] into GeSHi, so my BASICV codeparser can make use
>>> of
>>> the information.
>>>
>>> - The BASICV codeparser is written to generate the same information if
>>> it's not supplied in the metadata, using the source pre-process method
>>> and
>>> a bit of regexp-ing.
>>>
>>> - If someone were supplying an already detokenised source to GeSHi, and
>>> hence didn't have the proc list available, but wanted it returned, they
>>> could set &metadata['proclist'] = ''. The codeparser checks the metadata
>>> array, sees that the key 'proclist' exists but isn't populated, and
>>> populates it for the caller. Since the metadata was passed by reference,
>>> the user can pick this data up once GeSHi has finished parsing the
>>> source.
>>>
>>> It seems to work pretty well.
>> Pretty much the way I'd have implemented that one too ;-)
> Whew!
>>> One thing I was thinking about is whether the metadata could be passed
>>> directly to the GeSHi constructor rather than implementing the new
>>> setMetadata() method, as there is a comment in the current source
>>> suggesting that may have been an intended future enhancement:
>>>
>>> (from class.geshi.php)
>>> * @param string The source code to highlight
>>> * @param string The language to highlight the source with
>>> * @param string The path to the GeSHi data files. <b>This is no
>>> longer used!</b> The path is detected
>>> * automatically by GeSHi, this paramter is only
>>> included for backward compatibility. If
>>> * you want to set the path to the GeSHi data
>>> directories yourself, you should define the
>>> * GESHI_ROOT constant before including
>>> class.geshi.php.
>>> * @since 1.0.0
>>> */
>>> public function __construct ($source, $language_name, $path = '')
>>>
>>> [snip]
>>>
>>> // @todo [blocking 1.1.5] Make third parameter an option array thing
>>> (maybe)
>>>
>>> Any comments very welcome!
>> I'd differentiate here between the Code Parser Metadata (see above) and
>> the GeSHi options AKA Code Output settings. IMHO, the option array
>> mentioned in the comments is more like supplying presets to GeSHi then
>> influencing the Code Parser's workings. So this option array would
>> define functions like hooks, preset to paths for language files, ... But
>> not the Metadata intended for the Code Parseror the Code Renderer. This
>> should go in via a separate API IMHO.
> I'm not sure about splitting them, personally. From a user perspective, it
> might not be immediately obvious which metadata options are implemented by
> code in the language file, and which by code in the code parser. If you
> had, say, two different hashes for people to set options in, $option_array
> and $metadata or the like, it might be confusing as to where you should
> expect to find what.
Good point regarding the interface for this, but I think with good
documentation you can outweight that one. A starting point for a policy
might be: Things affecting the way the language behaves (stropping, link
generation, ...) go to the options argument; things related to the code
to be highlighted (annotations, function names, ...) go to the metadata.
>>> WPB
>> Regards,
>> BenBE.
> Obviously there's quite a bit to discuss here! Let me know what you
> think...
>
> Cheers,
> WPB
I agree. And maybe the other folks on the list have an oppinion too?
You're welcome to join in!
Regards,
BenBE.
|