Menu

Understanding Content & Contentpages folders and .new .old .dump files

2023-02-03
2023-02-05
  • Gitoffthelawn

    Gitoffthelawn - 2023-02-03

    When WCM was new, it would create a Content subfolder for its data. That folder contained .new and .old files. Everything was self-explanatory.

    Now, even with the WCM change database disabled, WCM creates both a Content subfolder and a Contentpages subfolder for downloaded data. What is the purpose of the 2 different folders?

    Within either of those folders, .dump files may now also be present. What is the purpose of those .dump files, and why are they needed in addition to the .new and .old files?

    TIA.

     
  • Morten MacFly

    Morten MacFly - 2023-02-04

    There are several reasons that might cause a content check to fail: Encoding errors, start/stop tag errors, regex errors, XML XPath errors, JSON errors. In such cases, a dump file is generated not to loose any information and these files have names accordingly. The same is true if you enabled to create a dump explicitly file everytime the CRC changes. I've now implemented the ability to enable/disable the creation of these files individually. You'll find settings accordingly in the next release - that should make it more transparent (and the default setting will be off).

     
    • Gitoffthelawn

      Gitoffthelawn - 2023-02-04

      Thanks. I see some that were created due to encoding errors. To WCM, what defines an encoding error?

       
      • Morten MacFly

        Morten MacFly - 2023-02-05

        An encoding error appears, if you try to read content from a web-page that is non-ascii and WCM is unable to detect the encoding properly. This should actually happen only rarely. But WCM is just as good as the encoding detector used which is a mixture of Google Encoding Detector (CED) and wxWidgets methods (primarily as fall-back). That's also why the dump file is written - it should contain the downloaded content "as-is" to find out whats going wrong. It could also be an issue with a mis-configured server, e.g. encoding errors will happen (definitely) if binary content is not marked as such by the server. In that case the content cannot be downloaded correctly as it is provided in wrong format by the server already.

         
  • Morten MacFly

    Morten MacFly - 2023-02-04

    BTW: There should only be one folder. and this should be named "pages". I don't know where "Content" is coming from - this seems to me like a setting you did. Please check the paths you setup in the configuration and the command line parameters.

     
    • Gitoffthelawn

      Gitoffthelawn - 2023-02-04

      Ah, this is apparently a bug in WCM. I'll take a deeper look and file a bug report when I have more details.

       
      • Morten MacFly

        Morten MacFly - 2023-02-05

        I am not convinced entirely yet at it is really a bug but lets see how the ticket you've created is going on...

         

Anonymous
Anonymous

Add attachments
Cancel





Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.