Menu

#610 making WW code reusable outside of MediaWiki

working markup
open
None
5
2014-11-12
2014-09-29
Lee Worden
No

While I'm working on the working markup project, I need to make fixes to parts of WW that fail when called in a non-MediaWiki environment, so I can reuse them.

Discussion

1 2 3 > >> (Page 1 of 3)
  • Lee Worden

    Lee Worden - 2014-09-29

    [original ticket description:
    While I'm working on the working markup project, I'm losing my way in the WW code.
    I give it a page with just a single project-file tag, and hope it will try to find some text and put it in place of the tag. It seems to be looking for an archived project file of that name! Why?
    ]

    This doesn't really merit a bug ticket, but this is a convenient place to take notes and make sense out of it. I apologize for the extra email notifications.

    So WWInterface::expand_tokens( $text, null, $markup_filename ) is what I call. It is calling WWStorage::find_file_content( $project_filename, $project_description, $markup_filename, false ). That final 'false' says the project file is not a source file. This means look for an archived project file, because find_file_content is not meaningful for a regular project file - it looks for file contents in the wiki, not in a working directory.

    Why is expand_tokens() calling this?

    • pass 1 of expand_tokens:
    • for each token
    • if ( it's not a source-file tag defining a source file's contents )
    • if ( the project name is known, and the pagename is known )
    • then call find_file_content()
    • if that finds content, record it at $uniq_tokens[$token]['file-content']
    • if that found content and the project doesn't have it as an archived file, produce a warning.

    OK, I see why it would be doing that (in the WW context). It looks like expand_tokens() may be doing a bunch of checks and maybe other things that don't make sense in a simpler markup context. I'll have a look at what they are and how to disentangle them.

     

    Last edit: Lee Worden 2014-10-01
  • Lee Worden

    Lee Worden - 2014-09-29

    In this case with the (non) archived project file, it shouldn't need to do a find_file_content, because the tag content would have been supplied with the rest of the tag data by the parser and I could just record it with the token data. But let's have a look at what else is going on in there.

     
  • Lee Worden

    Lee Worden - 2014-09-30

    It looks like when calling it without a wiki I can pretty much just omit the entire pass 1.

    The "right" way to do that is to use a WWInterface object, and in the markup case, use a subclass to provide an object that does certain methods differently. How hard would it be to do that? WW uses static class methods for WWInterface and WWStorage, because before now I haven't had a reason to instantiate an instance. So I would need to replace all calls to WWInterface::whatever() with $wwinterface->whatever() and make sure there is a $wwinterface variable in scope.

    I can make that object global, or continue moving toward providing a "context" object that is passed around, to increase modularity and reusability. I think passing it around would be better overall, but would take too much work, so I should probably make it global for now. But at least encapsulated into a global $wwContext that points to a WWInterface, WWStorage, and whatever else.

     
  • Lee Worden

    Lee Worden - 2014-09-30

    make it global for now. But at least encapsulated into a global $wwContext that points to a WWInterface, WWStorage, and whatever else.

    [#611]

     

    Related

    Bugs: #611

  • Lee Worden

    Lee Worden - 2014-10-01

    The expand_tokens issue is resolved - renaming this ticket for ongoing work on making relevant parts of WW compatible with non-mediawiki uses. (PE should already be compatible).

     
  • Lee Worden

    Lee Worden - 2014-10-01
    • summary: parsing and puzzlement --> making WW code reusable outside of MediaWiki
     
  • Lee Worden

    Lee Worden - 2014-10-01
    • Description has changed:

    Diff:

    --- old
    +++ new
    @@ -1,3 +1 @@
    -While I'm working on the [working](http://lalashan.mcmaster.ca/theobio/projects/index.php/Working_Markup) [markup](https://github.com/worden-lee/working-markdown) project, I'm losing my way in the WW code.
    -
    -I give it a page with just a single project-file tag, and hope it will try to find some text and put it in place of the tag.  It seems to be looking for an archived project file of that name!  Why?
    +While I'm working on the [working](http://lalashan.mcmaster.ca/theobio/projects/index.php/Working_Markup) [markup](https://github.com/worden-lee/working-markdown) project, I need to make fixes to parts of WW that fail when called in a non-MediaWiki environment, so I can reuse them.
    
     
  • Lee Worden

    Lee Worden - 2014-10-01

    WW uses MediaWiki's hooks framework here and there, to let one piece of code extend the behavior of another. For example, the Background code intervenes in project file rendering by making tags with make="background" render as links. And the Preview code intervenes in that by making them not functions as links during preview.

    The hooks framework doesn't work when the code isn't called from MW. Replace with a different, universally usable framework? Probably not too hard. Or bypass those calls, figuring the non-WW uses don't need those features at least for now?

     
    • Lee Worden

      Lee Worden - 2014-10-01

      What does WW use hooks for?

      • WW-BeforeGetProjectFile: Insert list of background jobs in special page
      • WW-BeforeManageProject: Insert list of background jobs in special page
      • WW-GetProjectFile-LocateFile: apparently not used
      • WW-AllowMakeInSession: make operations are allowed, except in existing background sessions.
      • WW-GetProjectFile-AssumeResourcesDirectory: when in a background session, a filename without a project name means in the session root directory, rather than the resources directory
      • WW-GetProjectFile-Headers: add "preview/background session key is" message to special page
      • WW-ManageProject-Headers: same
      • WW-GetProjectFile-Altlinks: background code adds 'make in background' to links
      • WW-ListDirectorySetup: report the name of the directory being browsed differently if it's in a preview or background session
      • WW-GetProjectFileQuery: when creating a GPF link, add preview or background session info as needed
      • WW-MakeManageProjectQuery: when creating an MP link, add preview or background session info as needed
      • WW-PERequest: when constructing a request to PE, add preview or background session info as needed
      • WW-MakeTarget: disallow make operations during the parse that happens while saving a page after preview
      • WW-OKToSyncFromExternalRepos: 'sync' in a git or svn project causes it to update from the remote repository, unless in a preview session.
      • WW-RenderProjectFile: check for make="background" and produce a background-make link if so. Also don't do project file operations during a save to avoid redundant makes.
      • WW-DynamicProjectFilesPlaceholderMessage: when previewing, display a different message in the placeholder, since reloading isn't the right thing.
      • WW-HiddenActionInputs: add background or preview session info to forms that perform WW actions.
      • WW-Api-Call-Arguments: add background or preview session info to links that invoke the MW/WW API.
      • WW-OKToInsertBackgroundJobsList: it's ok except in preview pages
      • WW-BackgroundMakeOK: it's ok to spin off background jobs, except when in a preview or background session
      • WW-AddToMakeForm: add 'background make' button to the form on MP
      • WW-CachePageFromDB: allow retrieving pages from the database, except if we're currently previewing that page
      • WW-OKToSyncSourceFiles: don't sync files from stored pages during preview, because those are the wrong versions of the files.
      • WW-OKToArchiveFiles: don't archive changes to project files make during preview
      • WW-SequesterArchivedProjectFiles: not currently in use
      • WW-ProactivelySyncIfNeeded: do a special sync operation during preview, to make sure we sync the submitted version of the source files while we have access to it

      ... ok! so all those are for background and preview stuff.

      I implemented preview and background jobs support basically as separate extensions that extend the WW extension. That hooks framework is how extensions are implemented, so I used it. It would probably be simple to reimplement it, by providing a function wwRunHooks that does what wfRunHooks does. Right now I think I'll write a wwRunHooks that does nothing when wfRunHooks is unavailable, and let reimplementation be an option for the future.

       
  • Lee Worden

    Lee Worden - 2014-10-01

    write a wwRunHooks that does nothing when wfRunHooks is unavailable, and let reimplementation be an option for the future.

    Done.

    Next up is calls to MW's RequestContext::getMain(), to get info about what's being requested from the wiki. Where's it being used?

    • ProjectEngineConnection looks for 'logkey' argument in the calling URL or API parameters, passes it to PE if so
    • ww_setup() checks it for 'disable-make' argument, and for 'ww-static-files' argument. Uses it to add a noscript-redirect to the HTTP header, to get static file display on non-javascript browsers.
    • ProjectDescription uses it to set the 'MW_PAGENAME' environment variable for make jobs
    • WWInterface uses it to add ResourceLoader modules to the output page
    • WWApiGetProjectFile accesses it and passes it as $context argument to ProjectEngineConnection::make_target().

    Indeed this object is being passed around among various WW functions. The $context argument is used

    • In ProjectEngineConnection's call_project_engine functions and ProjectDescription::fill_pe_request(), including for MW_PAGENAME
    • In WWPreview::OKToSyncSourceFiles_hook(), to compare the title of the page being previewed to the title of the page in question

    So the RequestContext is needed for

    • getting the name of the current page
    • checking for special flags on the URL ('disable-make', etc.)
    • adding ResourceLoader modules to the output.

    The latter 2 are fine and easily dealt with, since they're WW features that can be ignored outside of MW. As for the name of the page, I guess I should put it into the $wwContext object and stop passing the RequestContext object around.

    Is there a case where we recursively call things using a different RequestContext? Say, when parsing wikitext within wikitext? If so, we'd have a problem where passing this argument as an argument works but using the global $wwContext doesn't... looks like yes, since there's code to make it work in ww_setup()...

    • There is a DerivativeRequest in WWApiListResourcesDirectory, used to make a subsidiary call to WWApiListDirectory.
    • Also in WWAction::pass_to_api(), which is the shim code used whenever there's a "?ww-action=" in the URL, to reimplement the CGI-style WW actions as calls to the newer WW API actions.

    There are also uses of $wgRequest and other MW globals here and there (which are supposed to be replaced by use of RequestContext), so those'll be an issue as well.

    I think I may need to split WWInterface into two things - WWMWInterface (or just WWInterface) and WWProcessing, basically - so I can use the core WW stuff and not the MW stuff. Ideally make sure RequestContext is only used in that one file, and info extracted from it is passed to everything else, so all the rest is reusable.

    So for that I'd want to go through the whole thing and inventory what goes where, and look for potential complications. It's only 3600 lines of code :)... :(...

    Along with that, I should probably inventory all the places where $wg* globals are used, since they won't be there. Ideally all that info should be black-boxed into the WWInterface class.

     
  • Lee Worden

    Lee Worden - 2014-10-01

    I think I may need to split WWInterface into two things - WWMWInterface (or just WWInterface) and WWProcessing, basically - so I can use the core WW stuff and not the MW stuff.

    Ideally all that info should be black-boxed into the WWInterface class.

    More conservatively, maybe, move MW-specific things into WWInterface methods as needed, so I can override them in a non-MW subclass. And evolve more gradually toward having all that info black-boxed in WWInterface.

     
  • Lee Worden

    Lee Worden - 2014-10-04

    Along with that, I should probably inventory all the places where $wg* globals are used, since they won't be there. Ideally all that info should be black-boxed into the WWInterface class.

    Yep, and wf*() functions, starting with wfMsg().

    What a pain! But the result will be a clean piece of processing code that doesn't assume anything about the presence of a wiki.

     
    • Lee Worden

      Lee Worden - 2014-10-04

      I won't clog up this ticket with an endless catalog of wf and wg stuff. But I do want to take a look to see what I'm dealing with...

       
      • Lee Worden

        Lee Worden - 2014-10-04

        I won't clog up this ticket with an endless catalog of wf and wg stuff. But I do want to take a look to see what I'm dealing with...

        Well, there's lots. Missing wf*() functions will cause a crash, so they'll probably get fixed if I take an ad-hoc approach. $wg* globals will just appear as blank values, so I'll need to take some care that they don't cause subtle bugs.

        But right now, I'm just going to look at what's stopping my test code from running. Currently it's wfMsg(), so I'm going to write a wwMessage() function that does something really rudimentary when the wf*() are missing.

         

        Last edit: Lee Worden 2014-10-04
  • Lee Worden

    Lee Worden - 2014-10-14

    I am arriving to more interesting things: how to tell ProjectEngine to operate in an existing working directory in my home directory, rather than using files stashed in its cache.

    But first, I have this unresolved issue with ProjectDescription::env_for_make_jobs using global MW info to get a value for $(MW_PAGENAME) in certain cases. [#529] is where I worked through that. I made it work by passing a ResourceContext object as an argument through various function calls.

    But outside of MW there is no such object. Instead we need to have the WWInterface object know what the value of MW_PAGENAME should be, and we'll ask it, without passing that object around.

    This is trivial, except in the weird cases where it's not. IIRC, the weird stuff is (a) when using the API to make a file that is conceptually within a page, but we're not parsing the page, (b) when parsing .wikitext files that may or may not actually be contained by a page.

    These cases are actually not especially hard. I was worried about recursive execution, where I would need the pagename to be different during an inner call and then revert to its old value. I can't think of any cases where that would happen. I can handle normal parsing, API parsing, and .wikitext parsing all by just setting the page name in the WWInterface object and leaving it.

    So I will do that, and then I'll remove the $context argument from functions that no longer need it.

     

    Related

    Bugs: #529

  • Lee Worden

    Lee Worden - 2014-10-14

    MW_PAGENAME stuff is dealt with. On to how to make ProjectEngine process files in place, rather than away in its cache directories.

    I think it's like this:

    • Set a flag in the PE request. If we put 'process-in-place' => true, and the project URI is 'file://.' or 'file:///home/wonder/src/files-for-markup/', PE should do it in that location.
    • Use a configuration variable in PE for security. It should require $peAllowProcessingInPlace = true or refuse to accept the above. This variable needs to be false by default, and false in any PE installation that runs in a server accepting requests from the internet, for obvious reasons.
    • Don't send source file contents to sync in the PE request. The source files are already in the directory.
    • We may want to provide a list of source files' names however. Some things like the make rules to create .tex.d files use that list to tell which files to process. For later, this.
    • We probably want to support syncing source file contents from tags embedded in the markup file, literate-programming style. Later for this, also. For now require the source files to be actual files placed in the working directory by the human.
     
    • Lee Worden

      Lee Worden - 2014-10-14

      OK, well with the protocol as it is, the 'process-in-place' flag is redundant since it's implied by the use of a 'file:' url. But I'm going to require it anyway, to preserve the option of other uses of file: in the future.

       
  • Lee Worden

    Lee Worden - 2014-10-14

    Not my biggest priority, but I'm noticing it's not doing the 'short name' the way I would like for these file: project uris. This is the human-readable name it passes to PE along with the unfriendly URI name for each project, used to construct error messages, etc. I went to fix it and the code looks kind of messed up. Should fix.

     
  • Lee Worden

    Lee Worden - 2014-10-15

    PE is being kind of stubborn about assuming files go in its cache directory. When I ask it to do files in /usr/local/src/working-markdown, it complains that it can't create files in /var/cache/ProjectEngine/persistent//usr/local/src/working-markdown....

    I guess that assumption is built in in several places. I already have a class hierarchy for kinds of repositories (PERepositoryInterface as base, with PEWorkingWikiRepository, PEGitRepository, PEResourcesDirectoryRepository, etc. derived from it), and I've added PEInPlaceDirectoryRepository to the family, so now I need to get all the assumptions about where to cache encapsulated into PERepositoryInterface so I can override them in PEInPlaceDirectoryRepository.

     
    • Lee Worden

      Lee Worden - 2014-10-15

      PEResourcesDirectoryRepository already provides a directory that isn't in the cache. But it bypasses a lot of code by not allowing PE to sync files and make targets in its directory. PEInPlaceDirectoryRepository needs us to do those things correctly.

       
  • Lee Worden

    Lee Worden - 2014-10-15

    Also, a worthwhile step towards solving this problem is teaching PE to output diagnostic messages to stdout when I'm running it from the command line. It already knows how to output lots of diagnostics to the comet client, so I should be able to just add a little encapsulation and have it go multiple places.

     
    • Lee Worden

      Lee Worden - 2014-10-15

      I'd like to have a little class hierarchy, where you either have an output-diagnostics object or you don't, and if you do, you give your messages to it and it outputs them as appropriate.

      What I have now:

      • You call log_sse_message( $text, $request ), where $request is the request data provided by the client. It looks in $request to determine whether the client wants comet updates or not, and if so, it uses the key given to locate the logfile to append the updates into, for spooling by the separate comet server process. If no key is there, it ignores the update.

      So the question is how does PE know when to output messages to stdout. I can do it either by setting a global $pe variable, or by setting something in the request data. I guess in the request data is more flexible. It means more request data, but since it'll only be used when the request data isn't actually transmitted over a wire, I don't think that's a concern.

      So I can just extend that $request logic. The SSE (Comet) case is triggered
      by the presence of $request['sse-log-key']. Semantically it wouldn't be nice to put a special value in there to signify log to stdout. I could either rename it to a general purpose flag for how to log output, or introduce a separate 'log-to-stdout' flag. I think if I tried to implement a general-purpose flag, I'd end up needing it to be a (kind of output, output location) pair, which is basically the same as having different flags for different kinds of output but more verbose. So I think I'll go with 'log-to-stdout'. Come to think of it, this also has the advantage of allowing a client to request both SSE and stdout logging at the same time - though I can't see why, there's no reason not to allow it.

      So no class hierarchy, just testing for both flags in log_sse_message().

       
      • Lee Worden

        Lee Worden - 2014-10-15

        introduce a separate 'log-to-stdout' flag.

        Easy. Done in PE.

        Not so easy: how to get the WW code to insert this flag in every request to PE, when they're made from various points deep in the function call hierarchy.

        I've already faced the corresponding issue of how to get the 'log-to-sse' flag into all the requests, when processing a nest of function calls originating from an API call that wants SSE output. I dealt with that kind of ad-hoc, by putting a test for the sse key in the original HTTP request data in the ProjectEngineConnection class, which really shouldn't be jumping levels of abstraction like that. So I'd like to improve that as well.

        I could make ProjectEngineConnection an object, with state, allowing me to give it a logging behavior as part of its state and use that for all requests it generates.

        Or I can make the logging behavior be part of the WWInterface object's state. This is simpler to code, so I'll do that for now.

        Actually, rather than logging behavior per se, I'll have ProjectEngineConnection ask the WWInterface object to contribute to the request data before sending it, so it can add logging flags or modify other things. IIRC I already have PEC calling a hook function, to let the preview and background code add its flags, so I'll move that functionality into this call as well.

         
  • Lee Worden

    Lee Worden - 2014-10-15

    I'll have ProjectEngineConnection ask the WWInterface object to contribute to the request data before sending it, so it can add logging flags or modify other things.

    OK, that was easy too. Thinking about things before I do them is great!

    Now to use it to find out where the path is getting set wrong.

     
1 2 3 > >> (Page 1 of 3)

Anonymous
Anonymous

Add attachments
Cancel





Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.