Menu

Tree [71863f] master v1.0.0 /
 History

HTTPS access


File Date Author Commit
 SA 2011-08-12 Mark Dixon Mark Dixon [4ec0b8] initial release of slibi code
 bin 2012-01-04 Mark Dixon Mark Dixon [c9b57c] Made slibi_portstate argument checking a bit be...
 lib 2011-11-10 Mark Dixon Mark Dixon [ace5c1] More work with portstate and fabric checking
 AUTHORS 2011-08-12 Mark Dixon Mark Dixon [4ec0b8] initial release of slibi code
 LICENSE 2011-07-22 Mark Dixon Mark Dixon [f41fec] Intial import
 README 2012-02-08 Mark Dixon Mark Dixon [71863f] Updated README to reflect recent developments o...
 SA.pm 2011-08-12 Mark Dixon Mark Dixon [4ec0b8] initial release of slibi code
 ibMon.pm 2011-08-12 Mark Dixon Mark Dixon [4ec0b8] initial release of slibi code
 ibcheck 2011-08-12 Mark Dixon Mark Dixon [4ec0b8] initial release of slibi code
 ibcollect 2011-08-12 Mark Dixon Mark Dixon [4ec0b8] initial release of slibi code
 ibport_status_TESTDATA 2011-11-10 Mark Dixon Mark Dixon [ace5c1] More work with portstate and fabric checking
 ibreport 2011-08-12 Mark Dixon Mark Dixon [4ec0b8] initial release of slibi code
 ibreport_today 2011-08-12 Mark Dixon Mark Dixon [4ec0b8] initial release of slibi code
 schema.mysql 2011-08-12 Mark Dixon Mark Dixon [4ec0b8] initial release of slibi code
 slibi 2011-11-10 Mark Dixon Mark Dixon [ace5c1] More work with portstate and fabric checking

Read Me

Simple Logging InfiniBand Infrastructure (SLIBI)
================================================

Copyright (C) 2010-2011 University of Leeds

This is the early release of a logging and monitoring tool for InfiniBand
networks using the Open Fabrics Enterprise Distribution (OFED) stack,
principally directed at High Performance Computing (HPC) environments.

Its purpose is to allow continual monitoring of error rates on ports,
together with the collection of performance data such as throughput. This
allows the following to be done during normal cluster operation:

  * Identification of a hung switch (due to typical IB networks' high
  degeneracy, this can go unnoticed on a cluster).

  * Identification of faulty components: cards, cables, switch ports, etc.

  * Keep track of any ports deliberately disabled because they are logging
  errors, or are unused. InfiniBand has a horrible habit of enabling
  disabled ports after a switch power cycle.

  * Associate names with InfiniBand entities, allow easy identification of
  what ports are connected to.

  * It is hoped that later analysis of throughput data may help in
  capacity planning. Or at least pretty pictures.

This version is in production use on a cluster running Red Hat Enterprise
Linux 5, Mellanox QDR host cards and switches. The InfiniBand fabric has
449 hosts, 2464 switch ports and is running the minhop routing algorithm.


This software is under active development. In particular, this release has
an early version of the "slibi" command line tool, which will eventually
supercede the separate ibcheck, ibcollect and ibreport programs.


License
~~~~~~~

This software is released under the GNU General Public License, version 3.
Please see the LICENSE file for details.


Prerequisites
~~~~~~~~~~~~~

* MySQL database
* Perl5
* Common Perl5 modules, e.g. DBI, Getopt::Long, Data::Dumper, IO::File
* Perl5 modules for "slibi" command line interface: Term::ReadLine, Class::Struct, Safe
* cron

Internally, we make use of a simple Perl module called "SA". A
stub version has been provided with this release, providing minimal
functionality to allow the software to work.

WARNING: when run, this software will reset the error/performance counters
on your infiniband ports. Unless you are using other software which
makes use of this information (e.g. collectl, the sar tool replacement),
this shouldn't concern you.


Installation
~~~~~~~~~~~~

* Prepare a database

Create a new database within MySQL. Also create two accounts: one with
read/write access to it, the other with read access only.

 CREATE DATABASE infiniband; GRANT ALL ON infiniband.* TO infiniband
 IDENTIFIED BY 'SOME_PASSWORD'; GRANT SELECT ON infiniband.* TO
 infiniband_read IDENTIFIED BY 'SOME_PASSWORD';

(replace strings SOME_PASSWORD appropriately - you may also want to
review what hosts you allow MySQL logins from - the above will allow
remote logins by default)

Edit ibcheck, ibcollect and ibreport, replacing the strings SOME_DATABASE,
SOME_HOST, SOME_USER and SOME_PASSWORD appropriately.

Create the necessary tables using the schema.mysql file:

 mysql -h SOME_HOST -u SOME_USER -p SOME_DATABASE < schema.mysql


* Create cron jobs

These need to run under the root account, to allow the various InfiniBand
commands to work. Examples:

  # Collect InfiniBand fabric data
  22 * * * * root cd /data/infiniband/bin && perl ./ibcollect
  #
  # Report on today's InfiniBand fabric data
  44 23 * * * root cd /data/infiniband/bin && ./ibreport_today 2>&1 | mail -s "ib errors summary" SOME_EMAIL_ADDRESS
  #
  # Report on up/down links, missing hosts or switches
  44 0 * * * root cd /data/infiniband/bin && ./ibcheck 2>&1 | mail -s "ib node summary" SOME_EMAIL_ADDRESS

  (edit times and SOME_EMAIL_ADDRESS appropriately)

* Update switch information

  Hostnames are automatically populated each time ibcollect is run. This
  information ultimately comes from what the IB stack on the host reports
  it as.

  Switch names are not automatically populated, apart from when they
  are first discovered. You may wish to update the "name" field in table
  "nodes" to something more memorable.

  The "enclosures" table helps keep track of the different switch types
  and will be used to map logical ports to physical ports. This will aid
  the identification of physical cables for less usual IB switch types
  (e.g. those connectors containing 3 ports). You may wish to set the
  "node" enclosure_id appropriately.


FAQ
~~~

Q: I've run ./ibreport_today and it doesn't print anything!

A1: Well done - it hasn't found any problems :)

A2: Take a look at the contents of table "timedata". You should have a
   tuple for each port multiplied by the number of times you've run
   ibcollect. You need to run ibcollect at least twice before slibi can
   calculate if ports have exceeded error rates.


Q: What error rates are really bad?

A: There's a lot of debate about this. SymbolErrors seem to be the most
   minor (packets that don't make sense), then RcvErrors (data errors),
   then LinkRecovers are the most serious (port flapping, resulting
   in a topology change and recalculation). The top of ibreport has some
   example definitions of "bad" error rates, but we're not experts.


Q: What are these error numbers that are suspiciously power of 2?

A: InfiniBand counters don't wrap. The field is a certain size, so if you
   see a power of 2, it has probably overflowed. This is why ibcollect
   resets all counters each time it is run.


Q: Why don't use use a Round Robin Database (RRD) to store this information,
   instead of a RDBMS like MySQL?

A1: We've got lots of disk space

A2: RRD systems such as rrtool/Ganglia/Cacti are great at storing aggregate
   information about time series while occupying a static amount of space.
   However, the nature of routing in InfiniBand means that you tend to be
   interested in specific error events, not looking at averages.

A3: Our cluster has almost 3000 switch and host ports. For each port, we're
   collecting 16 items of information, together with how the ports are
   connected to each other. SQL seemed the obvious choice to handle this.


Q: What OFED commands does it use?

A: ibnetdiscover (for topology information), perfquery (for access to
   counters), ibportstate (for access to port status).


Q: Why did you write this?

A1: We had Infiniband problems, seriously affecting our Lustre filesystem.
   We also had real users we couldn't kick-off. We needed something to
   keep track of error rates over time under normal use.

A2: We found the documentation of the normal OFED tools impenetrable. The
   diagnostic tools seem best suited for debugging a cluster running a
   known test workload, and not one in production. We wanted something to
   keep an eye on things with an unknown workload.


TODO
~~~~

* Document how to use it :)

* [IN PROGRESS] Remove the root access requirement: use sudo to access
privileged operations.

* [IN PROGRESS] Create a separate configuration file, allowing the
specification of database details in one place. Also use to allow
configuration of tool - e.g.  only collect error data, only collect
performance data, etc.

* [IN PROGRESS] Unify the different commands under a single CLI.

* Allow modification of the database through the CLI, instead of
seat-of-the-pants modification of the SQL database.

* Check how portable this software is, try and get it working on other
cluster and other InfiniBand vendor equipment.

* GraphViz visualisation of the topology (or subset).

* Extension of to cope with identification of physical cables carrying
multiple links (e.g. like our existing Sun 3-way cables). Already partly
implemented, but currently non-functional.

* Keep track of node names at different points in time, instead of just
keeping the last one seen.

* Analysis of traffic to look for bottlenecks.


Example database queries
~~~~~~~~~~~~~~~~~~~~~~~~

* Get the port_id for a GUID/port combination:

SELECT
    ports.id as port_id,
    nodes.name,
    nodes.guid,
    ports.port,
    ports.status
  FROM
    nodes,ports
  WHERE
      nodes.id = ports.node_id
    AND
      nodes.guid = '0x5080020000b3b5dd';

* Get a description for a port_id

SELECT
    nodes.name,guid,port,ports.status,ports.id
  FROM
    nodes,ports
  WHERE
      nodes.id = ports.node_id
    AND
      ports.id = 1635;

* The above works for a host. If nodes.name starts 'I4', it's a switch. Use this instead:

SELECT
    enclosures.name,nodes.name,guid,port,ports.status,ports.id
  FROM
    enclosures,nodes,ports
  WHERE
      enclosures.id = nodes.enclosure_id
    AND
      nodes.id = ports.node_id
    AND
      ports.id = 1635;

* Get the GUIDs for a node:

SELECT
    nodes.name,
    nodes.guid,
    ports.port,
    ports.id,ports.status
  FROM
    nodes,ports
  WHERE
      nodes.id = ports.node_id
    AND
      nodes.name like 'c1s3b3n%';

* Last port_id seen connected to a port_id

SELECT
    ports.id as port_id,
    nodes.name,
    nodes.guid,
    ports.port,
    ports.status
  FROM
    nodes,ports
  WHERE
      nodes.id = ports.node_id
    AND
      ports.id = (
        SELECT
            conn_port_id
          FROM
            timedata
          WHERE
              port_id = 1245
            AND
              conn_port_id IS NOT NULL
          ORDER BY TIMESERIES_ID DESC LIMIT 1
      );

* List ports with a status of faulty

SELECT
    enclosures.name,
    nodes.name,
    nodes.guid,
    ports.port,
    ports.status
  FROM
    enclosures,nodes,ports
  WHERE
      enclosures.id = nodes.enclosure_id
    AND
      nodes.id = ports.node_id
    AND
      ports.status = 'faulty'
  ORDER BY enclosures.name;

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.