Menu

#78 CQPweb: no alert on absent or malformed text + text_id attributes in input data

TODO-3.5
open
CQPweb (22)
7
2023-07-21
2023-07-21
ram
No

Hi, I am opening this ticket because CQPweb is hanging in an specific scenario.

  • Input. An standard query in a corpus with 12 p-attributes (in a corpus with no extra p-attributes everything works as expected).
  • Output when there is not match. Expected no results.
  • Output when there is a match. Platform hangs and consumes all the CPU; I have to restart the web server in order to stop this behavior.

I attach the VRT file (the test corpus only has one VRT) and the data corpus directory as a ZIP file.

1 Attachments

Discussion

  • ram

    ram - 2023-07-21

    I forgot to put the versions:

    • CQPweb: 3.2.43
    • CWB: 3.5.0
     
  • Stephanie Evert

    Stephanie Evert - 2023-07-21

    (a) Does this happen for a specific query, or for every query that returns some matches? In the latter case there' s probably sth in the corpus that confuses CQPweb.

    (b) Are there also problems if you run the query directly in CQP?

    (c) You can't possibly have installed this corpus in CQPweb because it's lacking the mandatory <text id="..."> elements!

     

    Last edit: Stephanie Evert 2023-07-21
    • ram

      ram - 2023-07-21

      Let's see:

      1. Every time it returns a match
      2. No, it works if I use CQP
      3. I did :S
       
      • Stephanie Evert

        Stephanie Evert - 2023-07-21

        Re. 3: If you installed this particular corpus in CQPweb, you must have ignored all the error messages that it shot at you. It should have outright refused to install the corpus, but perhaps it went far enough to get its database into an inconsistent state that causes the lock-up.

         
        • ram

          ram - 2023-07-21

          I didn't get any errors. I attach the process I am doing for the corpus installation. Thanks!

           
        • ram

          ram - 2023-07-21

          I can confirm that the problem is because the lack of <text> tag. I did a test where I include that tag and it doesn't hangs. But now I have these doubts:

          1. What does the [UNREADABLE] means in the query result?
          2. Just to be sure, is the id attribute for the text tag recommended or mandatory?

          I attach the VRT test file and the screenshot of the result.

           
          • ram

            ram - 2023-07-21

            Ok, now I confirm that the [UNREADABLE] was because the text tag requires the id. With the id attribute everything work as expected.

            Thanks @schtepf for your patience. The only weird thing left is that I don't receive an error message or warning when I don't add the text tag.

             
          • Andrew Hardie

            Andrew Hardie - 2023-07-21

            [UNREADABLE] means that the some word could not be read from the data returned by CQP. In this case, it happened because the absence of text_id mucked up the processes that break up that data for formatting.

            You'll note that on your installation form screenshot, there is a notice at the top of the XML table saying that <text> and its id="..." are "... compulsory".</text>

            So they are added to the corpus definition even if you don't speciy them on the form.

            Unfortunately cwb-encode doesn't issue errors for declared tags that don't appear. So CQPweb doesn't know there is an error. I'd better add a check for correct text tags in input files. But this will be in 3.3 not 3.2 as I don't add new features there.

             
            👍
            1
  • Andrew Hardie

    Andrew Hardie - 2023-07-21
    • summary: CQPweb hangs in an specific scenario --> CQPweb: no alert on absent or malformed text + text_id attributes in input data
    • assigned_to: Andrew Hardie
     

Log in to post a comment.