#3201 uncommon story words oddness

Slash 2.5/3.0
Rob Malda
Admin (123)
Jamie McCarthy

We posted a dupe today and I was wondering why, when
editing it, the URL does
not make today's story a similar-stories match for the
story on Nov. 8:

I put debug prints into refreshUncommonStoryWords():

my $word_hr = { };
print STDERR "uncommon_words number of stories: " .
scalar(@$arr) . "\n";
for my $ar (@$arr) {

keys %$word_hr;
print STDERR "uncommon_words, full length:
my $uncommon_words = substr(join(" ",
@uncommon_words), 0, $maxlen);

You can find the results at

In that file, the URL that appears in both stories does
appear, at character position 26,000 or so:

[jamie@slashdot-nfs-1 logs]$ perl -lne 'print pos()
while /(.{30}({30})/g' ucw

and the var uncommonstorywords_maxlen for Slashdot is
set to 65000 so that URL should appear as one of the
uncommon words. However, when I started this whole
debugging process, it did not:

mysql> select count(*) from uncommonstorywords where
word like 'http://members%';
| count(*) |
| 0 |
1 row in set (0.00 sec)

Now (argh! debugging is hard) it does:

mysql> select count(*) from uncommonstorywords where
word like '';
| count(*) |
| 1 |
1 row in set (0.00 sec)

So my guess is that the URL wasn't being picked up from
the Nov. 8 story but it is now from today's story.
Which shouldn't be, because the similarstorydays var is
set to 120. And findWords grants all URLs a high weight
so it's almost certain that it would have been in the
list ever since Nov. 8 (I guess you could check the
backup DBs' data to confirm that).

And just to make things worse, with the URL in the
table (confirmed above), hitting the Edit link on
today's story still doesn't show the Nov. 8 story in
the resulting webpage as a similar story. I guess
that's a second bug.


  • Tim Vroom
    Tim Vroom

    • priority: 5 --> 9
  • Tim Vroom
    Tim Vroom

    • assigned_to: tvroom --> cmdrtaco
    • status: open --> open-fixed
  • Rob Malda
    Rob Malda

    • status: open-fixed --> closed-fixed