WebChangeMonitor / Forum / General Discussion and Help: Understanding Content & Contentpages folders and .new .old .dump files

Gitoffthelawn - 2023-02-03

When WCM was new, it would create a Content subfolder for its data. That folder contained .new and .old files. Everything was self-explanatory.

Now, even with the WCM change database disabled, WCM creates both a Content subfolder and a Contentpages subfolder for downloaded data. What is the purpose of the 2 different folders?

Within either of those folders, .dump files may now also be present. What is the purpose of those .dump files, and why are they needed in addition to the .new and .old files?

TIA.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Morten MacFly - 2023-02-04

There are several reasons that might cause a content check to fail: Encoding errors, start/stop tag errors, regex errors, XML XPath errors, JSON errors. In such cases, a dump file is generated not to loose any information and these files have names accordingly. The same is true if you enabled to create a dump explicitly file everytime the CRC changes. I've now implemented the ability to enable/disable the creation of these files individually. You'll find settings accordingly in the next release - that should make it more transparent (and the default setting will be off).

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.
- Gitoffthelawn - 2023-02-04
  
  Thanks. I see some that were created due to encoding errors. To WCM, what defines an encoding error?
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Anonymous
    
    Add attachments
    Cancel
    You seem to have CSS turned off. Please don't fill out this field.
    
    You seem to have CSS turned off. Please don't fill out this field.
  - Morten MacFly - 2023-02-05
    
    An encoding error appears, if you try to read content from a web-page that is non-ascii and WCM is unable to detect the encoding properly. This should actually happen only rarely. But WCM is just as good as the encoding detector used which is a mixture of Google Encoding Detector (CED) and wxWidgets methods (primarily as fall-back). That's also why the dump file is written - it should contain the downloaded content "as-is" to find out whats going wrong. It could also be an issue with a mis-configured server, e.g. encoding errors will happen (definitely) if binary content is not marked as such by the server. In that case the content cannot be downloaded correctly as it is provided in wrong format by the server already.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    
    Anonymous
    
    Add attachments
    Cancel
    You seem to have CSS turned off. Please don't fill out this field.
    
    You seem to have CSS turned off. Please don't fill out this field.

Morten MacFly - 2023-02-04

BTW: There should only be one folder. and this should be named "pages". I don't know where "Content" is coming from - this seems to me like a setting you did. Please check the paths you setup in the configuration and the command line parameters.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.
- Gitoffthelawn - 2023-02-04
  
  Ah, this is apparently a bug in WCM. I'll take a deeper look and file a bug report when I have more details.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Anonymous
    
    Add attachments
    Cancel
    You seem to have CSS turned off. Please don't fill out this field.
    
    You seem to have CSS turned off. Please don't fill out this field.
  - Morten MacFly - 2023-02-05
    
    I am not convinced entirely yet at it is really a bug but lets see how the ticket you've created is going on...
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    
    Anonymous
    
    Add attachments
    Cancel
    You seem to have CSS turned off. Please don't fill out this field.
    
    You seem to have CSS turned off. Please don't fill out this field.

Understanding Content & Contentpages folders and .new .old .dump files

Monitors a number of web pages for changes.

Forums

Help

Understanding Content & Contentpages folders and .new .old .dump files

Understanding Content & Contentpages folders and .new .old .dump files

Monitors a number of web pages for changes.

Forums

Help

Understanding Content & Contentpages folders and .new .old .dump files document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Understanding Content & Contentpages folders and .new .old .dump files