Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

#3201 uncommon story words oddness

Slash 2.5/3.0
closed-fixed
Rob Malda
Admin (123)
9
2004-12-21
2004-11-28
Jamie McCarthy
No

We posted a dupe today and I was wondering why, when
editing it, the URL http://members.home.nl/gis/ does
not make today's story a similar-stories match for the
story on Nov. 8:

https://slashdot.org/admin.pl?op=edit&sid=04/11/28/1434228

https://games.slashdot.org/admin.pl?op=edit&sid=04/11/08/1438215

I put debug prints into refreshUncommonStoryWords():

my $word_hr = { };
print STDERR "uncommon_words number of stories: " .
scalar(@$arr) . "\n";
for my $ar (@$arr) {

keys %$word_hr;
print STDERR "uncommon_words, full length:
'@uncommon_words'\n";
my $uncommon_words = substr(join(" ",
@uncommon_words), 0, $maxlen);

You can find the results at
/usr/local/slash/site/banjo.slashdot.org/logs/ucw.

In that file, the URL that appears in both stories does
appear, at character position 26,000 or so:

[jamie@slashdot-nfs-1 logs]$ perl -lne 'print pos()
while /(.{30}(members.home.nl).{30})/g' ucw
26192

and the var uncommonstorywords_maxlen for Slashdot is
set to 65000 so that URL should appear as one of the
uncommon words. However, when I started this whole
debugging process, it did not:

mysql> select count(*) from uncommonstorywords where
word like 'http://members%';
+----------+
| count(*) |
+----------+
| 0 |
+----------+
1 row in set (0.00 sec)

Now (argh! debugging is hard) it does:

mysql> select count(*) from uncommonstorywords where
word like 'http://members.home.nl/gis/';
+----------+
| count(*) |
+----------+
| 1 |
+----------+
1 row in set (0.00 sec)

So my guess is that the URL wasn't being picked up from
the Nov. 8 story but it is now from today's story.
Which shouldn't be, because the similarstorydays var is
set to 120. And findWords grants all URLs a high weight
so it's almost certain that it would have been in the
list ever since Nov. 8 (I guess you could check the
backup DBs' data to confirm that).

And just to make things worse, with the URL in the
table (confirmed above), hitting the Edit link on
today's story still doesn't show the Nov. 8 story in
the resulting webpage as a similar story. I guess
that's a second bug.

Discussion

  • Tim Vroom
    Tim Vroom
    2004-12-16

    • priority: 5 --> 9
     
  • Tim Vroom
    Tim Vroom
    2004-12-21

    • assigned_to: tvroom --> cmdrtaco
    • status: open --> open-fixed
     
  • Rob Malda
    Rob Malda
    2004-12-21

    • status: open-fixed --> closed-fixed