Notes: Release Notes for Beta 2.0 uDAPL/kDAPL Release October 28, 2003 DAPL BETA 2.0 RELEASE NOTES The DAPL team is pleased to announce the addition of kDAPL to the dapl repository. This code has been contributed by Fujitsu Prime Software Technologies, Limited, the DAPL project is very appreciative of this excellent work! The implementation is believed to be complete, but has not been thoroughly tested. At least one large application uses it, as well as a number of test programs in use by the dapl team, providing some confidence that the core implementation is working. There are several routines that have explicitly NOT been tested, please see the comments next to the provider jump vector in ./common/dapl_provider.c. kDAPL has a large source code overlap with uDAPL, so they are tightly integrated into the same tree. There are a number of changes to the location of certain files as we separated kDAPL from uDAPL from common into their appropriate directories. The files have not been renamed, but may be in different directories than you expect if you have been working with previous versions of DAPL. Some of the files in ./kDAPL contain conditional code under #if defined(__KDAPL__); this code is currently similar to the uDAPL version so conditionals exist to make it easier to merge bug fixes to either tree. There are a handful if conditional statements in the common and adapter code as well. For the most part, we have avoided conditionals as they make the code unreadable, but they are justified in a few cases. This code has been tested on the JNI InfiniBand HCA, used by the DAPL team, which is based the IB API verbs. This API supports the same verbs in user and kernel space, the difference being the kernel version of the verbs is accessed through a jump vector obtained from the provider driver. A sample of this vector is found in ./ibapi/verbslist.h. THE verbslist.h FILE IN THE DISTRIBUTION IS FAKE, IT IS ONLY PROVIDED AS AN EXAMPLE AND TO ALLOW THE CODE TO COMPILE. Your HCA provider vendor must provide you with the proper verbs structure and must implement the Linux inter_module interfaces to obtain it. Note that this sample file may be incomplete for a product. At this time, the equivalent VAPI work has not occurred. If you can contribute the necessary pieces to support kDAPL on that interface it would be greatly appreciated. The top level build does not compile the kDAPL trees, they must be done manually. In order to use kdapl: > cd dat/kdat > make > cd Target > insmod dat_registry.o > cd ../../../dapl/kdapl > make > cd Target > insmod dapl.o You can now do an lsmod and see your drivers running. To state the obvious, if your provider driver is not running then you may run into trouble. The kdapl driver also supports a limited set of debugging options, using the same format as provided by the uDAPL environment variable DAPL_DBG_TYPE. To get debug messages printed on the console, use > insmod dapl.i DbgLvl=0xff ... or whatever your favorite set of debug bits is. kdapl runs as a driver, so printing debug messages to the console has a serious impact on system performance. As always, the DAPL community appreciates the contribution of bug fixes and code that will complete the implementation. There may yet be spec violations and/or unimplemented options in this code, it continues as a work in progress. SPECIAL NOTE ON BETA 1.0 Perhaps it is hubris on the part of the DAPL team, but now that we are nearly DAT Spec 1.1 compliant and the code base is being used by many companies, we have decided to rev our release signature to Beta releases and discontinue Alpha releases. This is a stake in the ground both for spec compliance and the recognition that things are generally stable. If you have DAPL applications, they are going to require changes in order to be compliant with this code base, so do not pick it up lightly: you have been warned! In changing to the new level of the spec, some of the documents found in the ./doc directory may be out of date. We have made an effort to keep them current, but it is time for a wholesale review by the development team to see if they still reflect the implementation. RELEASE NOTES We would like to officially announce the availability of a public source implementation of uDAPL. This implementation has been developed by Network Appliance with significant contributions from JNI, IBM, Mellanox, and a number of companies who wish to remain anonymous at this time. The uDAPL source code is available on Source Forge under the new DAPL foundry, see http://sourceforge.net/projects/dapl Both kDAPL and uDAPL exist in this source tree, although only a small amount of work has been done on kDAPL to date. Source Forge is the world's largest Open Source development website and provides a number of services to developers and the Open Source community. The DAPL source code is being provided free to all interested parties. The license provisions for this project is designed to encourage commercial development and products based upon this source code. There are no intellectual property claims attached to this code. See the project web page above for hyperlinks pointing to the license. NEW SINCE Beta 1.10 * kDAPL is now part of the DAPL distribution. See the release notes above. The kDAPL 1.1 spec is now contained in the doc/ subdirectory. * Several files have been moved around as part of the kDAPL checkin. Some files that were previously in udapl/ are now in common/, some in common are now in udapl/. The goal was to make sure files are properly located and make sense for the build. * Source code formatting changes for consistency. * Bug fixes - dapl_evd_create() was comparing the wrong bit combinations, allowing bogus EVDs to be created. - Removed code that swallowed zero length I/O requests, which are allowed by the spec and are useful to applications. - Locking in dapli_get_sp_ep was asymmetric; fixed it so the routine will take and release the lock. Cosmetic change. - dapl_get_consuemr_context() will now verify the pointer argument 'context' is not NULL. NEW SINCE Beta 1.09 * We now take advantage of cm_disconnect(abrupt), which has the semantics to disconnect the connection but will not generate a callback. DAPL does generate a disconnect event for the app, so there is some care needed. This also introduces some interesting race conditions for the number of disconnect scenarios, e.g. remote disconnect results in asynchronous local disconnect callback; disconnect(graceful) followed by disconnect(abrupt) may still get a callback, etc. The code now properly deals with these race conditions. * Increased the number of RDMA_READ credits (ibapi only) to allow better performance for overlapped I/O. * Bug fixes - Typos and formatting changes to clean up and help make the code base more consistent. Added a couple new debug statements. - CR records now removed from the PSP/RSP when doing a close(abrupt) - Added casts for 64 bit values to prevent inadvertent sign propagation. - Do a better job of cleaning up EP fields if a cr_accept fails, which prevents false information from being presented by the next ep_query. - Fixed ia_close_abrupt to check for empty queues before trying to dequeue, which was causing an assert. - Changed the ACK_TIMEOUT value to be within the range specified by the IBTA spec. Noticed various DTO errors when going through a switch with high traffic, traced down to insufficient ACK timeout value. ibapi only. NEW SINCE Beta 1.08 * Support for DAT RPM is now complete. * Minor documentation updates * Fixed dat_error.h to have a DAT_NO_SUBTYPE as the first (zeroth) element of the subtype enum. If there is no subtype, dat_strerr should not report something bogus. * dapltest fixes: - fixed FFT memory test - fft_queryinfo now uses correct set of DAT enums for query ops - Cleanup of fft initializers - Thread and EP synchronization has been revamped and should be correct. Two major problems here: 1) All EP's used the same connection EVD, making it very likely that a specific EP would lose a race and harvest the wrong event. 2) Threads were not synchronized on start/stop, so they could easily miss connections or get rejected when they were confused. * Modest number of simple cleanups * Bug fixes - Several SMP synchronization fixes: a) RSP/PSP locking and synchronization b) ep_free and CM thread synchronization c) disconnect_clean called with locks held d) EP locking corrected for disconnects e) connection timer setup/teardown - Added support to dapl_ia_close() force a QP into the ERROR state in order to force posted ops to flush; used by disconnect ABRUPT at present. - Fixed missing data structure when calling the provider to modify the QP. - Fixed bug where we copied the IPv6 address twice when querying the QP. - Fixed ep_disconnect to be a no-op if the EP is disconnected. - EVD's are now initialized to enabled state per the spec - Corrected notion that UNSIGNALLED means the same thing in IB and DAPL, which is not the case. NEW SINCE Beta 1.07 * Red Hat Package Management (RPM) specification file and makefile updates to allow manual RPM creation of dat library. Automated support for creating a DAT RPM will come in a future release. * Bug fixes - cr_query will return correct status and the remote IP address of the connecting node. - 'threshold' value in evd_wait is tested against the queue length and rejected if too large. - Fixed synchronization between RSP/PSP free and the CM callback thread. - Fixed synchronization between EP free and the CM callback thread. - Spec compliance for ep_disconnect() to be a no-op if the EP is DISCONNECTED. NEW SINCE Beta 1.06 * All DAPL ALPHA release notes removed from this file. * For IBM Access providers, ib_types.h has been updated such that max_block_bytes is now 64 bits (instead of 32) to allow large memories. THIS CHANGE IMPLIES ia_query IS INCOMPATIBLE WITH OLD VERSIONS. * local and remote address formats reworked to present as AF_INET (IPv4) addresses if possible. * dapltest now displays local HCA IP address as part of config info (use -d to see it). * Removed compile time references to PSC, no longer in who is business. * dapltest changed to use service_id instead of pid when printing debug messages. Now both sides of the wire print the same value for connected threads. Also added simple test for private_data on connections; errors are non fatal, but will cause a warning message to be printed. * Changed the dat headers to allow MAJOR, MINOR, and THREADSAFE #defines to be in config files; per DAT Collaborative revision. Reference implementation default is now DAT_THREADSAFE=FALSE. * Cleanded up various debug messages that occasionally spew out. * Deleted unused fields in dapl structures, & bogus references to them in code. * VAPI files updated and building cleanly again. * Framework for SOLICITED_WAIT support now in place. Still not fully implemented. * Conformance test updates: - depends_on results field more consistent & correct - Send & recv DTO tests added - Updated RDMA tests - Enhanced recv tests - Connection management tests added - CNO tests added * Bug fixes - Provider allocated EP cleaned up on cr_reject - dapl_ia_open fixed to prevent memory leaks on error paths - Using a reject reason of CONSUMER_REJECT (defined by IBTA), we can get a CM callback type of IB_CME_DESTINATION_REJECT_PRIVATE_DATA, even if there is no private data; all other reject types result in a callback of IB_CME_DESTINATION_REJECT. This allows us to generate the correct events on the connection EVD and to distinguish app level rejects vs. system level rejects or no listener present. - dapl_hca structure provides storage for local EP ip addresses. - In dapl_ep_create.c, cleaned up request_evd error checking to return correct error values. - Fixed bug when providing an EP for a CR, link it onto the IA. - Various small changes to keep compilers happy. - Fixed race condition between a CM thread and an app thread when releasing an SP object. NEW SINCE Beta 1.05 * dat_evd_set_unwaitable and dat_evd_clear_unwaitable implemented. * ib_enum_hca_if() moved below the IB abstraction layer. * Added rpath to LD_FLAGS in Makefile to find vendor libraries * Dynamic library loading support added for Windows * Conformance test updates: - CNO tests added - Function Records now have depends_on support - More connection related tests - RDMA tests * Bug fixes - Only request the remote IP address from the switch (ATS naming) when the app requests it. Don't be aggressive. - Removed duplicate dapl_evd_wait prototype in dapl.h - Fixed uninitialized handle bug in dapltest performance tests - cr_accept and cr_reject properly back out changes if the provider fails. - dapltest fft test fixes will no longer SEGV on non existent device names. Cleaned up output. - dapltest pthreads create in DETACHED state to avoid resource problems. - Added locking for SP structure in critical places. - Register fixes for correctness, and to report the number available when the user asks for 0 entries (instead of SEGV). - Use More portable member of in6 structure - Clean ups for 64 bit compilers NEW SINCE Beta 1.04 * More cleanup of debug and logging messages. More debug and trace messages added. * Now support 1.1 semantics for PSP and RSPs fully, per CTN 60. * DAT registry is now fully 1.1 compliant. * evd_stream_merging_supported enabled and used. * DAT headers updated for DAT errata 99 and 100. * ep_free still not sorted out totally; if you fail to disconnect or fail to wait for a disconnect event to arrive, the underlying QP will not be freed and you cannot dispose of the associated PZ. * Conformance test updates: - connection tests - RDMA tests - various bug fixes * Bug fixes - ep_free will allow UNCONNECTED EPs to be freed. - Cleanup of some comments - cr_accept finishes verifying parameters before making assignments - DAT locking fixed to not lock around malloc/free/library calls. - evd_enable will verify a valid CQ is present before asking the provider for callbacks (not all EVDs have CQs). - Fixed up support for obtaining remote IP address on a connection, and cleaned up associated files. NEW SINCE Beta 1.03 * strerror() implemented * Initial pass at verifying the extended attributes of ia_openv(). More changes next drop as we get this sorted out. This may mandate that you change your current dat.conf file: it should be MAJOR 1, MINOR 1, and nonthreadsafe. dat.h specifies 'threadsafe' so you need to adjust your makefile to override this. * dat.conf file example updated for 1.1 spec * Now obtain firmware information from the hca_attrib structure for ibapi. * The reference implementation is not thread_safe, so we updated the makefiles to reflect this. * New 1.1 Provider Attribute optimal_alignment field now accurately set. * DTO/RMR completion status returns now comply with 1.1 spec. * DAT_NOT_IMPLEMENTED changed from 0xffff0000 to 0x0fff0000 per 1.1 erratta. * dat_ep_free() now conforms with the 1.1 spec; will do a disconnect if the EP is connected. * Conformance test updates: - Now supports client/server tests - Supports multi-node tests - EVD tests - Updated IA tests - EP tests - Improved Makefile * Bug fixes - Verify connection EVD on EP create - DAT_RETURN_SUBTYPE duplicate DAT_INVALID_STATE_EVD_IN_USE changed - Bug where DISCONNECT_LINK_DOWN was different from DISCONNECT - EVD now accurately reflects the number of CQ entries allocated by the underlying provider - Locking fixes for SMP NEW SINCE Beta 1.02 * Reworked the debug print/log routine and support. This change is largely cosmetic, but affects almost every file. * Updated error return from dapls_evd_post_software_event() to comply with DAPL 1.1 * Updated ep_free to disconnect if the EP is connected, per the 1.1 spec. * Conformance test updates: - Fixes for OS dependent layer - Better IA tests - Sample/examples directory added - depends_on support added * Bug fixes - LMR_PARAM now properly initialized in lmr_create - Code clean up: Removed extra assert, better compliance with dapl internal naming, better formatting in a few source files, removed C++ style comments. - Fixed incorrect comparison in dapl_lmr_bind.c - dat library now preserves the order of registrations - ep_disconnect now uses the correct DAT_CLOSE_FLAGS - Fixed code under IBHOSTS_NAMING to work with new SOCKADDR definitions. - removed extra disconnect_clean() call in callback code - dapls_ep_state_subtype() now returns DAT_RETURN_SUBTYPE - Corrected simple typos in dat include files - ibapi code now properly obtains the IP address from the correct port_static_info structures according to the port number (instead of using the first one for all ports) - Moved lock in dat/common/dat_dr.c to not include malloc; this is insufficient, need to go through locking in the dictionary routines for next drop. - dat_cr_accept now transitions the EP state to DAT_EP_STATE_COMPLETION_PENDING, in compliance with the 1.1 spec. NEW SINCE Beta 1.01 * Updated dat_ia_close to not unload the provider library when provider's close function fails. * Added compile time option to support real-time/embedded systems that don't have/use a file system, making the DAT static registry problematic (DAT_NO_STATIC_REGISTRY). * Conformance test updates: - README file with usage instructions - Better support for dynamic value tables, removed specific types that limited chains development - Simple test case for ia_open/query/close - Fixes to be 1.1 compliant - Reworked much of the code to support an OS dependent layer, for portability. * Bug fixes - rmr_context now correctly returned from lmr_create() (really!). - Fixed bugs on error paths of dat library - Fixed casts in various error programs to use an unsigned rather than signed value, avoiding sign bit propogation. - Cleaned up problems with exit codes from ep_free. - Cleaned up error codes in various dapl files - Use more portable DAT_SOCK_ADDR dtypes in ibapi connection code. - dapl_ep_disconnect now complies with 1.1 spec, several changes made. GRACEFUL flag now supported in ibapi code. - RDMA read limits now supported in EP_ATTRIBUTES on EP creation - Check async_evd_qlen for bogus values in dapl_ia_open NEW SINCE Beta 1.0 * DAPL conformance test has been checked in. Uses the chain technology described in a not to the DAT Collaborative mail list. More tests and documentation will be coming. NOTE: This is a work in progress, there will be several changes in the upcoming weeks. * Updates to document: dapl_vendor_specific_changes.txt. New doc: dapl_ibm_api_variations.txt * Memory tests and echo tests updated to 1.1 * Assertion for 'over decremented' reference counts * Numerous cleanups, including but not limited to removing unnecessary casts; formatting changes; replacing true/false with more portable DAT_TRUE/DAT_FALSE; removed dead ifdefs; removed unnecessary locks; makefile cleanups; * DAPL version numbers, as reported in ia_attributes, are now in sync with nuemonics in dat.h. * Ignore 0 length IOVs (1.1 compliance) * DAPL_ATS now obtains the IP address (DAT_IA_ADDRESS) from the provider. Note that the IA_ADDRESS returned by ep_query is a sockaddr_in6 of AF_INET6 family. * memory test now has ability to request invalid memry regions for negative testing. * Enhanced error reporting (more subtypes added to returns) * dapl_vendor.h has better documentation and removed confusing compile time switch. * Bug fixes - Don't decrement the EVD refcount in psp_create error path, as it will happen again in psp_free. rsp_create had the same problem and fix. - evd_create will return errors if the queue length is bogus or too large for the underlying provider to support. - dapltest working better with 1.1 API thanks to some judiciuos dat_ep_reset() calls. - EP states now correspond exactly to the DAT spec. - Added missing locks - rsp_create failure will now clean up the EP. - rsp_create will now return error codes if it fails. - Correctly handle close failures - Various *_free routines now return an error code and do not clean up on provider failure, allowing the app to recover or take corrective action. - ep_create will verify ep_attributes * rmr_context now correctly returned from lmr_create(). OBTAIN THE CODE To obtain the tree for your local machine you can check it out of the source repository using CVS tools. CVS is common on Unix systems and available as freeware on Windows machines. The command to anonymously obtain the source code from Source Forge (with no password) is: cvs -d:pserver:anonymous@cvs.dapl.sourceforge.net:/cvsroot/dapl login cvs -z3 -d:pserver:anonymous@cvs.dapl.sourceforge.net:/cvsroot/dapl co . When prompted for a password, simply press the Enter key. Source Forge also contains explicit directions on how to become a developer, as well as how to use different CVS commands. You may also browse the source code using the URL: http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/dapl/ SYSTEM REQUIREMENTS This project has been implemented on Red Hat Linux 7.3, and SuSE SLES 8. The structure of the code is designed to allow other operating systems to easily be adapted, but no work has been done in that direction. The DAPL team has used Mellanox Tavor based InfiniBand HCAs for development, and continue with this platform. Our HCAs use the IB verbs API submitted by IBM. Mellanox has contributed an adapter layer using their VAPI verbs API. Either platform is available to any group considering DAPL work. The structure of the uDAPL source allows other provider API sets to be easily integrated. The development team uses any one of four topologies: a single HCA with a wrap plug (loopback) device; two HCAs in a single machine; a single HCA in each of two machines; and most commonly, a switch. The DAPL Plugfest revealed that switches and HCAs available from most vendors will interoperate with little trouble, given the most recent releases of software. The dapl reference team makes no recommendation on HCA or switch vendors. Explicit machine configurations are available upon request. IN THE TREE The DAPL tree contains source code for the uDAPL and kDAPL implementations, and also includes tests and documentation. Included documentation has the base level API of the providers: the IBM Access API and the Mellanox Verbs API. Also included are a growing number of DAPL design documents which lead the reader through specific DAPL subsystems. More design documents are in progress and will appear in the tree in the near future. A small number of test applications and a unit test framework are also included. dapltest is the primary testing application used by the DAPL team, it is capable of simulating a variety of loads and exercises a large number of interfaces. Full documentation is included for each of the tests. Recently, the dapl conformance test has been added to the source repository. The test provides coverage of the most common interfaces, doing both positive and negative testing. Vendors providing DAPL implementation are strongly encouraged to run this set of tests. MAKEFILE NOTES There are a number #ifdef's in the code that were necessary during early development. They are disappearing as we have time to take advantage of features and work available from newer releases of provider software. You may notice an #ifdef <something>_BUSTED, which indicates a particular feature was not working at the time the code was written and the DAPL team developed a work-around. These #ifdefs are not documented as the intent is to remove them as soon as possible. Of particular relevance are the following #defines: - CM_BUSTED The DAPL team has been an early adopter of InfiniBand and has had to improvise missing functionality while the vendors lag our development. InfiniBand uses a Connection Manager (CM) to establish a connection between nodes. This #define essentially 'fakes' a connection by moving a QP into the appropriate state. Most of the IB vendors have a working CM now and this is no longer the default, but the code remains as some development groups are working to catch up. - NO_NAME_SERVICE Naming is a thorny issue in InfiniBand; translating from a hostname or an interface name to a GID that can be used to establish a connection with a remote machine. The reference implementation provides a simple name service under this #define. The goal is to use IPoIB when it becomes available. NO_NAME_SERVICE will probably remain in the code long term in order to enable various implementations. A description of how this works is found in the end_point_design document in the doc/ directory. CONTRIBUTIONS As is common to Source Forge projects, there are a small number of developers directly associated with the source tree and having privileges to change the tree. Requested updates, changes, bug fixes, enhancements, or contributions should be sent to Steve Sears at sjs@netapp.com for review. We welcome your contributions and expect the quality of the project will improve thanks to your help. The core DAPL team is: Steve Sears Randy Smith ... with contributions from a number of excellent engineers in various companies contributing to the open source effort. ONGOING WORK Not all of the DAPL spec is implemented at this time. Some functionality is missing from CNO's and events, shared memory will probably not be implemented by the reference implementation (there is a write up on this in the doc/ area), and there are yet various cases where work remains to be done. And of course, not all of the implemented functionality has been tested yet. The DAPL team continues to develop and test the tree with the intent of completing the specification and delivering a robust and useful implementation. The DAPL Team
Changes:
Copyright © 2009 Geeknet, Inc. All rights reserved. Terms of Use