Wish List for xCAT 2
This page represents the roadmap for xCAT as far as we know it at this point in time. The xCAT development plan is flexible and is often reprioritized based on user requirements. This is good because it means xCAT focusses on what users want, but it also means we can't guarantee that the roadmap won't change.
We want to hear your requirements and input on the roadmap. New features can be requested by opening a Tracker feature or by posting them to the xCAT mailing list. There is also a wish list specifically for the xCAT web interface.
Candidates for xCAT 2.7.x
- FB support on AIX 7.1D and RHEL6.2. (Xiao Peng, Jie Hua)
- FB IB QDR support on AIX 7.1D and RHEL6.2 (Jie Hua)
- p7 HV16 IB QDR on AIX 7.1 & RHEL6.2 (Jie Hua)
- p7 IH support with RHEL6.2 (with custom kernel) for internal and beta use (Ting Ting)
- DB2 support on RHEL 6.2 (Ting Ting)
- SLES 11 SP2 support (Yang Song)
- kdump support for SLES diskless nodes (Xu Qing)
- HA MN support on Linux (Guang Cheng) - is this for x or p775??
- AIX statelite enhancements. (set up resolv_conf resources on backup service nodes) (AIX) (Norm)
- Rolling update for RHEL 6.2 system x cluster (Bai Yuan)
- DFM library clean up of extended error code processing: current error handling has some duplication of error codes with a single response causing misleading error code messages. (Bill)
- Support wildcard in litefile table for image attribute. Add support for wildcards in the litefile.image attribute, so one entry could match multiple images names. Customer has some images for gpfs storage nodes, all the images start with ST, they want to have something like "ST*" in the litefile table for these images.(Hua Zhong)
- Postscripts logic cleanup, bug https://sourceforge.net/tracker/?func=detail&aid=3487614&group_id=208749&atid=1006945 (Guang Cheng)
- Move the creation of the mypostscript file from /tmp/mypostscript to /xcatpost/mypostscript
- Extract the common code for generating the mypostscript file and make it common across all scripts under /opt/xcat/share/xcat/install/scripts
- Could diskful installation and diskless boot use the same logic to create postscript file
- Add timestamps on the running of the postscripts and start and stop headers in /var/log/xcat/xcat.log
- Writing xcat.log (and output returned to caller) in stream like mode
- Change all logger calls (in xCAT) to this format logger -p local4.info -txCAT <msg>, to put all xCAT syslog entries under local4. Currently inconsistent. MsgUtils uses local4, logger calls use "user". This is important so we can filter all xCAT entries to a logfile.
- p7 IH Firmware assisted dump for linux (Hua Zhong)
- Scaling and performance improvements for xcat during DARPA demos.
- AIX 7.1.F, AIX 6.1.S support (Han Jing)
- HA NFS for service nodes using for Linux, use CNFS or NFSv4 replication(Guang Cheng)
- Nagios monitoring plugin (Ling/CDL) - done??
- Support multiple domains, initial design: https://sourceforge.net/apps/mediawiki/xcat/index.php?title=Managing_Multiple_Sites_and_Clusters (Norm)
- HPC integration updates for new HPC product versions & packaging. Also, maybe support for some open source HPC tools(Hua Zhong) - start transitioning to something closer to kits??
- Support aix/linux mixed clusters (Cao Li)
- Encrypted passwords in xCAT tables. Do not show the real passwords to users. Bug 3364300 https://sourceforge.net/tracker/?func=detail&aid=3364300&group_id=208749&atid=1006945 (On password storage, encryption is possible via a passphrase, anything else would be mere obfuscation. MD5 is not an encryption algorithm, it is a hash algorithm. Programs needing a password cannot derive it from MD5 hashes. Programs that need to authenticate clients don't need to remember passwords, just hashes of the password, but the client needs to know passwd.) (Sun Jing)
- Power 775 firmware concurrent update (Jie Hua)
- Add an optional version # to the client/xcatd XML protocol. (Get this in early, so we can exploit it later.)
Candidates for xCAT 2.8
Priority 1 items
- Support for latest versions of RHEL, CentOS, Fedora, SLES, AIX and Windows (Yang Song, Norm, Xiao Peng)
- Support for latest hardware(Testing driven, Ting Ting)
- PCM/xCAT integration
- HPC integration updates for new HPC product versions & packaging (in kit format, or close to kit format). (Hua Zhong)
- Network boot for IPv6 (delivery target: xCAT 2.8 for AIX and xCAT 2.8.x for Linux) (Guang Cheng) - trying to move aix ipv6 to 2013
- Support SDMC (initial support, more work will be done in 2.8.x) (Yin Le)
Priority 2 items
- Support statelite image group, add a new colume osimage.groups, the litefile.image and litetree.image can refer to the osimage.groups to specify multiple statelite images use the same entry in litefile and litetree tables. (Hua Zhong)
- Use %::XCATSITEVALS: xcatd populates all the site attributes into %::XCATSITEVALS, the %::XCATSITEVALS will be updated for each xcatd connection, the xCAT code should use the %::XCATSITEVALS instead of opening site table to read the site attributes. There are several things need to be done:
- Populate %::XCATSITEVALS with XCATBYPASS=1 also, should be done in xCAT::Client::submit_request
- Update Utils->get_site_attribute to try %::XCATSITEVALS first, if the attribute is not popluated in %::XCATSITEVALS, then open site table to read the attribute
- Search xCAT code to find all the places that open site table to read attributes, change to call Utils->get_site_attribute instead.
- Merge remoteshell and aixremoteshell postscripts so that the singe entry in postscripts xcatdefaults is correct for both (Done 2.8)
- DFM FSP CIM interface investigation: As FSPs become assimilated into Director there may be more CIM enhancements made in the FSP support which we would be able to make use of in our xCAT management. This is to track some investigation of the FSP CIM interface enhancements. (John)
- clean up all/most the places we check /etc/redhat-release or /etc/suse-release or for debian/ubuntu: Proposed to collect most distribution-related things into one perl module. One of the functions in there should be able to return various attributes of the system (e.g. the path of the dhcp conf file. (Yang Song)
- Code performance analysis and improvements with scalability configuration(was: getnodetype performance improvements): go through the code to discover as many performance problems as possible, for example, are we doing too many things in a "foreach $node" loop? https://sourceforge.net/apps/mediawiki/xcat/index.php?title=Programming_Tips#Performance_considerations gives some hints. (Lissa?)
- Code refinement for some xCAT plugins. Some of the xCAT plugins are getting hard to maintain, because: 1) the code logic is very complex. 2) the subroutines in the plugins are getting too long. 3) Too many global variables. For example, the DBobjUtils.pm, DBobjectdefs.pm, aixinstall.pm and updatenode.pm. We need to reorg these files, at least split the subroutines into smaller ones and reduce the numbers of global variables.(Guang Cheng, Norm, Yang Song)
- Implement a nodecheck cmd: (Xiao Peng)
- Make it consistent with the health check framework
- if nodehm.mgt not set, complain, otherwise perform per-method appropriate checks:
- first, config check, make sure required table values are there
- second actually interrogate service processor and report anything horribly wrong
- noderes.netboot, similar check
- deeper check actually go to tftp space and live dhcp data
- host resolution
- the challenge being whether a node is *supposed* to resolve or not (dynamic nodes never turned on will not resolve, but can probably take cues from dhcp config).
- do something to show what the code does if a attribute value is not filled in? (i.e. looks at a site attribute and then uses a hardcoded value.) This is very difficult to implement for all attributes.
- xcat discovery option to randomly/sequentially assign node names if switch info is not defined. Also may want to watch syslog for dhcprequests? (Jarrod, Bruce)
- TEAL integration with the view monitoring framework (Yang Song)
- Complete debian/ubuntu support to have xcat better work in an open stack environment. What has already been done by Arif and the LTC VPL group on system x is: ()
- Add site option to tell xcatd to capture the output of each xcat cmd in a separate file under /var/log/xcat/commands. The file name should have the cmd name in it and something else to make it unique (pid or time of day). This way admins could refer back to any cmd they've run. The dir could be pruned the way /tmp is.(Xiao Peng)
- Support system p live lpar migration in rmigrate cmd. (Director already supports this in VMControl. HMC/phype provides the support.) This is a requirement for system p cloud management and for the IBM events infrastructure. I think this is only needed for HMC environments (i.e. not DFM and not bladecenter). (Er Tao)
- Clean up Util.pm:
- Move functions that access the db into a new TableUtils.pm
- Move network functions into NetworkUtils.pm
- Should we have a separate ServiceNodeUtils.pm?
- Move other functions out that aren't basic utils
- Document all functions (use svn blame to see who added what)
- Remove: Globals:, Error:, Comments: (almost no one fills them in)
- Support new "reboot" scripts for stateful (fulldisk install) nodes. These scripts should be run after postbootscripts during initial install, and then run as part of every reboot of the node.
- DB attribute defaults framework:(Lissa)
- add another optional key in the Schema.pm tabspec hash for each table call "defaults" that can list a hard code default that the code will use if this value is not filled in, or a comment that will explain to the user what kind of default the code will use (e.g. another attribute).
- change all code to get that out of Schema.pm instead of hardcoding a value
- modify tabdump -d and man page bldg to show those defaults to the user
- Add support for automatically configuring dhcp on xCAT MN and SNs. (Norm)
- Add flocks to plugins to prevent multiple cmds running at the same time that can't be
- Using Utils->runxcmd or InstUtils->xcmd in xCAT plugins instead of calling xdsh command directly.(Bug 2974652 https://sourceforge.net/tracker/?func=detail&aid=2974652&group_id=208749&atid=1006945)
- Table.pm routine another Where clause routine that will format the Where clause appropriate for the database. Limited input Support geting all attributes in getNode* routines if no attribute list is supplied(Lissa)
- Document at the top of each postscript, usage information and what it does. In the Postscripts/prescripts doc, indicate they should read the postscript file for this information. At least try and get our default ones and servicenode,xcatserver,xcatclient done.
- DFM IPv6 support (Jie Hua, Bill)
- Add verbose for xCAT commands, according to the discussion with the xCAT team, we think the following commands need verbose: mkhwconn, nodeset, getmacs -D, *vm commands, makedhcp, makedns, rbootseq. rspconfig. We can work on these commands in xCAT 2.8, and work on some other commands in the future release. (Jie Hua)
Priority 3 items
- Have statelite booting nodes download rc.statelite (probably via ftp), instead of having it bundled in the boot image to enable backward compatibility more easily. (Hua Zhong)
- Rolling Updates -- support "retry failed nodes" option with updateall update method(Linda)
- When postage builds the list postscripts and postbootscripts to run on the nodes, remove any duplications so they will not run twice. Customers have often not realized that their postscipts table and now the addition of postscripts and postbootscripts in the osimage table result in the postscripts being added to the list mulitiple times. Maybe we could somehow put out a warning also?(Hua Zhong)
- Support option to create /install on the node's local disk of service node (using site.installloc, already have postscript make_sn_fs for AIX) (Lissa)
- Support long hostname as the xCAT node name, bug https://sourceforge.net/tracker/?func=detail&aid=3323391&group_id=208749&atid=1006945
- Validate that bootp broadcast works with several non-authoritative dhcp/bootp servers, only one of which is configured for that node (both linux and aix) (Hua Zhong)
- Add audit logging of commands like chtab that do not go through the daemon and XCATBYPASS mode ( Lissa)
- HPC Integration -- provide samples for "dev" nodes (full compilers, editors, debuggers, etc.)(Hua Zhong)
- New boot kernel using dracut and centos 6. Explore the possiblity of using dracut for initrd create on all Linux distributions, it will make the genimage code logic be consistent for all the Linux distributions, we have seen a number of problems with the xCAT customized ramdisk. For now, dracut is shipped with RHEL6, we are seeing some discussion context of porting dracut to other distributions.(Hua Zhong)
- xCAT commands should clean up processes on SN and CN when ctrl-c(bug 2805644 https://sourceforge.net/tracker/?func=detail&aid=2805644&group_id=208749&atid=1006945). Not only updatenode command has this problem, maybe we should come up with a general solution for all xCAT commands.
- Finish up support for all documented options in the policy table, or restrict the options. (Lissa)is Chris working on this?
- Make statelite.image a 2nd key in that table. This enables having multiple statelite images for 1 node or group of nodes. Will also require def cmd changes to display/set it correctly. (Hua Zhong)
- AIX osimage replication: support imgexport/imgimport for AIX (or just doc manual process to create an osimage on a different MN or SN. ?)
- System p energy mgmt update, including exploitation of public CIM interface on HV (not supported on IH) (Xiao Peng)
Candidates for xCAT 2.9 or Later
- Modify boot order via rbootseq and IMM and UEFI
- LDAP support: configure LDAP, manage LDAP users
- Boot over Infiniband
- Support rolling updates with LSF
- Add to prescripts to use the new semaphore to have a global throttle for each prescript
- statelite: con type should include image .default entries.(bug 3176516 https://sourceforge.net/tracker/?func=detail&aid=3176516&group_id=208749&atid=1006945)
- support user-provided diskless image update script on AIX, similar to postinstall scripts run by genimage for Linux
- Support noderes.servicenode being set to sn01,mn01
- Web GUI testing (Cao Li)
- xCAT enhancements for the HPC Cloud Suite: OVF image support(Ling, CDL)
- Manage IB switches? (create vlans, etc) (Jie Hua)
- Application performance monitoring (Torque, LL) (from Egan)
- refine lsvm, chvm, rspconfig, etc
- Support role definitions. The specific scenario is that users of the xcat web portal interface are regular users, not admins, and need to be able to do a few xcat commands (with specific flags), but shouldn't be allowed to do them all. For these users, we could put many lines in the policy table, but it would be much cleaner to have a separate table called roles in which a role could be named and all the cmds/flags that are allowed for that role. Then the policy table could support that role name in the command column.
- Add xCAT support for the new AIX/NIM NAS appliance feature. (The NAS feature will have the capability of hosting file-type resources (such as mksysb, savevg, resolv_conf, bosinst_data, script) and can be used for install purposes without the need to alter any .info files on the spot server.) Required xCAT support TBD. At minimum there would be some documentation updates.(Norm)
- Merge OpenSLP 1.2.1 patches into OpenSLP 2.0
- monitor framework honor noderange(Bug 2952099 https://sourceforge.net/tracker/?func=detail&aid=2952099&group_id=208749&atid=1006945)
- Add additional noderes attributes for where to get the boot kernel and initrd, separate from the tftpserver attribute. (Jarrod)
- Provide an easy way to config/start the recommended/default monitoring
- Change xcatmon to use nmap instead of fping, build nmap for AIX, & take fping out of the Requires.(Xu Qing)
- Automatically set up logrotate of console files on sn & mn (Ling)
- Change rmnimimage to remove the xCAT osimage definition by default.(Norm)
- Add NIM maint_boot support to nimnodeset command. (Norm)
- Add xCAT on AIX support for postscripts (pre-boot customization scripts)(Norm)
- Support the follow on to the HMC.(HMC is planning to implement some improvements to support multiple targets/objects in one command invocation for some of the HMC commands, xCAT should be able to leverage the changes to improve the xCAT efficiency for the HMC managed nodes control. I am not sure what time frame these HMC improvements will be available in)(Yin Le)
- Move node-specific info from /install to /tftpboot (Jarrod)
- Do a survey of existing monitoring GUI frontends to determine the best (most common) one to provide a bridge between ELA data and it
- Make all MN config info be in the db or generated from info in db. (So that a db backup/restore will put all config info back on the MN.) Or provide an xcatbackup/restore cmd to capture all files (e.g. /include/custom) and the db.
- Ability to run replaycons from management node instead of sn (Ling)
- Have notification architecture handle hierarchy (Ling or CDL)
- Fix Table.pm code on how it handles where-clauses for tables that have regular expressions in them?
- Add overall cluster health summary (how many down, up, etc.) into DB
- Modify client/svr communication to allow client cmds to prompt user (needed for copycds, xdsh -K, and genimage)
- Investigate using MonAMI (http://monami.sourceforge.net/) to abstract different monitoring tools
- Support multiple levels of dependencies and hierarchy in dep table (CDL)
- Add WOL (perl -MNet::Wake -e "Net::Wake::by_udp(undef,'$MAC')")
- Add help (list of cmds) & version cmds to the client/svr protocol for CRI and others
- Implement cluster user mgmt for Argonne: Add a local (cluster-only) user, Deactivate a local user, Activate an LDAP user (grant existing LDAP user access to cluster), Deactivate an LDAP user.
- Include AIX perl IPv6 routines
- IPMI is IPv4 only as of 2.0. Until that specification is revised, service processors implementing IPMI must be managed through IPv4.
- IPv6 support in the distributions is there, but in RHEL5 and SLES10, the support could be characterized as interim. RHEL6 and SLES11 would migrate to ISC DHCPD 4, which has built-in IPV6, rather than use a separate daemon for each.
- Remove some of the IPv6 restrictions (e.g. hierarchical)
- Rolling update plugin support to allow 3rd party schedulers (other than LoadLeveler) to schedule nodes for updates
- Data management process. To limit number of SQL connections concurrently per xcatd instance, improve bug 1875930). This could also host cross-process caches to reduce frequency of requests as well as concurrency).
- Virtualization/container assistance/framework (libvirt managed ones and VMWare?) (feature 1905355)
- Replace port 3001 on installing nodes/stage with ssh (or somehow otherwise authenticate using private/public key, encryption not a must?). bmcsetup is encrypted in 2.x and in 2.1 forward is authenticated through privileged port usage.
- setupxcat if failing to detect sufficient site configuration to act as it usually does, enters an interactive mode to prompt with common defaults for them