Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

#23 init_ouis is slow and stupid

0.95
closed-fixed
Bill Fenner
Front-End (23)
5
2006-04-09
2004-12-15
Justin A
No

it has:
$ouis = sql_column('node',['distinct(oui)','true']);
...
# New Way - This is a lot faster.
my @uc_ouis = map(uc,keys %$ouis);
my $size = 40;

# Lookup the OUIs $size at a time.
while (my @chunk = splice(@uc_ouis,0,$size)){
my $comps =
sql_rows('oui',['oui','company'],{'oui' =>[\@chunk]});
....

on my network there were some 60,000 distinct
ouis(granted half were from one broken node)
that code ends up running 1,500 queries.
Why does it not just do:
"select distinct(node.oui),company from node left join
oui on node.oui = oui.oui"

In the mean time, deleting the garbage nodes from a few
ports and changing size to be 1000 brought the load
time of the page down to a second or two from about 30.
I'd send a patch, but perl and sql_rows hurts my head :-)

Discussion

  • Max Baker
    Max Baker
    2004-12-16

    Logged In: YES
    user_id=19674

    did you try your patch before you cleaned out the nodes to
    see if it improved performance?

    In the future, I would consider ommiting "is slow and
    stupid" as well as "but perl and sql_rows hurts my head" if
    you would like me to actually respond. Insulting an author
    who works for free on an open source project is not likely
    going to make them want to spend any time on you.

     
  • Justin A
    Justin A
    2004-12-16

    Logged In: YES
    user_id=547833

    sorry... I have a weird sense of humor sometimes :-)

    Now that you mention it, i'm not sure if deleting 30,000
    nodes or changing 40->1000 made the most difference.
    Regardless, I think the better way to handle this situation
    would be to fix the original cause of the problem.

    a new command --expire-fake-macs or such, which can just
    delete all of those nodes might be best.

    There are a few ways I see that this can be handled...
    I can run
    "
    SELECT n.switch,n.port,COUNT(n.mac)
    FROM node n LEFT JOIN node_ip i ON n.mac=i.mac
    WHERE port LIKE 'FastEthernet0/%' AND i.ip IS NULL
    GROUP BY n.switch,n.port HAVING COUNT(n.mac) > 40
    ORDER BY count DESC
    "

    and get a list including which switch, ports have too many
    nodes with no IP (on end user ports)

    the hardcoding of "FastEthernet0/%' is probably a nono for
    everyone else. checking for remote_* in device_port would be
    better if you knew cdp was working..

    now, once the switch,ports are found I see two ways to
    handle them:

    1) delete every node on the port, and re macsuck the device
    which would find any real devices again. This would be the
    simplest to code, but would remove some historical data.

    2) delete everything but nodes that have had an ip,
    something like

    "
    DELETE FROM node WHERE switch=? AND port=? AND mac NOT IN
    (SELECT n.mac FROM node n, node_ip i WHERE n.mac=i.mac AND
    n.switch=? AND n.port=?)
    "

    That's what I'm doing now.. what do you think?
    attached is the quick program I wrote to do the clean up the
    other day, I could port it to perl if you think it is
    worthwhile.

     
  • Justin A
    Justin A
    2004-12-16

    proof of concept node cleaner

     
    Attachments
  • Bill Fenner
    Bill Fenner
    2006-04-03

    Logged In: YES
    user_id=109593

    Just for kicks, I tried a couple of queries:
    - left join oui on upper(node.oui)=oui.oui
    - create ouil table, with ouil.oui = lower(oui.oui), and left join ouil on
    node.oui=ouil.oui

    Both joins added about 100ms to the page load time. (Completely
    unscientific, except that going from 350ms -> 450ms consistently does
    seem to say "that one's slower").

    This is somewhat unexpected to me, since I usually count on the database
    being better at this kind of thing than user code, but what can I say? (And
    yes, I did an ANALYZE of both oui and ouil just before running the tests, and
    they both have a primary index on their oui column)

    (distinct(oui) returned 193 rows - so this is attempting to be a test of the
    "normal case", not the 30,000 junk rows case - I'd bet that the join would be
    better in that case but I wouldn't want to penalize the normal case for that)

    With a little sql explain help, I found that the join was happening before the
    distinct(), even if node.oui has an index, so I used a subselect:

    SELECT oui,company FROM ouil WHERE (oui in (select distinct(oui) from
    node))

    and that started returning in ~300ms. I haven't done any benchmarks with a
    garbage-filled node table though. (Note also that this was from "ouil", an oui
    table with a lowercased oui string.)

    I also noticed that if you're doing a vendor search, it calls init_ouis multiple
    times. This would obviously be bad if init_ouis was slow...

     
  • Bill Fenner
    Bill Fenner
    2006-04-03

    Logged In: YES
    user_id=109593

    Looks like the subselect is a winner. I added 40,000 bogus nodes to my node
    table, with random MAC addresses -> random OUIs. The existing code took 4.5
    seconds; the subselect took 0.7 seconds. So, since it makes it negligibally
    better in the normal case and significantly better in the pathological case, I'm
    inclined to commit the subselect code.

     
  • Max Baker
    Max Baker
    2006-04-03

    Logged In: YES
    user_id=19674

    My vote always goes to the Pathalogical.

    J.R. "Bob" Dobbs for president!

    I can try the two against the ucsc production server --
    what's the SQL?

    thanks.
    -m

     
  • Bill Fenner
    Bill Fenner
    2006-04-03

    • milestone: --> 0.95
    • assigned_to: nobody --> fenner
     
  • Bill Fenner
    Bill Fenner
    2006-04-09

    • status: open --> closed-fixed
     
  • Bill Fenner
    Bill Fenner
    2006-04-09

    Logged In: YES
    user_id=109593

    I committed my change. Pass the frop!