Menu

#11 Attributes without " or ' not parsed correctly

open
HTML parser (6)
5
2009-08-12
2009-08-12
Anonymous
No

Many times the emails I am trying to parse have attributes without the " or ' surrounding the value (I think mainly Microsoft generated emails). These don't seem to parse correctly. Any ways to fix this easily?

Discussion

  • Milian Wolff

    Milian Wolff - 2009-08-13

    Could you please provide an example HTML snippet so I could take a look? If it's an easy fix I might be able to fix it.

    Though I want to be straight with you: I don't use this stuff myself anymore and hence development pretty much stopped...

     
  • Nobody/Anonymous

    Thanks. Any help you could provide would be greatly appreciated. Here is an example. The span tag is removed correctly, but the paragraph tag stays. This happens on your demo page also.

    <p class=MsoNormal><span style='font-size:10.0pt;font-family:"Tahoma","sans-serif";
    color:#1F497D'>inside text</span></p>

     
  • Nobody/Anonymous

    Here is the source code that I added. I am not really great at that stuff, but tried to figure something out. The second else if is what i added:

    elseif (in_array($this->html[$pos].$this->html[$pos+1], array('="', "='"))) {
    # get attribute value
    $pos++;
    $await = $this->html[$pos]; # single or double quote
    $pos++;
    $value = '';
    while (isset($this->html[$pos]) && $this->html[$pos] != $await) {
    $value .= $this->html[$pos];
    $pos++;
    }
    $attributes[$currAttrib] = $value;
    $currAttrib = '';
    } elseif (in_array($this->html[$pos], array('='))) {
    # get attribute value
    $await = array(" ", ">");
    $pos++;
    $value = '';
    while (isset($this->html[$pos]) && !in_array($this->html[$pos], $await)) {
    $value .= $this->html[$pos];
    $pos++;
    }
    if ($this->html[$pos] === ">") {
    $pos--;
    }
    $attributes[$currAttrib] = $value;
    $currAttrib = '';
    }

     

Log in to post a comment.

MongoDB Logo MongoDB