OpenFAM Reference Implementation

Error handling

Note that unlike C or C11, C++ provides native support for exceptions. Hence the C++ API adopts the C++ style of handling and reporting errors to its caller, using exceptions. Unlike the normal C convention of returning integer values (0 for success and negative values in case of failure), it uses C++ exceptions to indicate error scenarios. All APIs return the expected output on success (or void if no output is returned), and throw an exception on error. Instead of checking the return value for errors as in C, the caller should use try-catch blocks for handling errors.

The OpenFAM 2.0 implementation defines a Fam_Exception class, which is derived from C++ standard exception class. It also defines a list of individual error numbers to categorize various types of failures. Individual error numbers identify specific error conditions. The Fam_Exception object received by the caller in case of an error contains a specific error number and an appropriate error message string. The application can retrieve this information using member functions, fam_error() and fam_error_msg()/what() and take any necessary action. The currently defined exception and error numbers are defined in Table 1 and Table 2

Table 1: List of OpenFAM exceptions
Fam Exception Class Description
Fam_Exception This exception class object is returned for all the error conditions. It will contain specific error number.

Table 2: List of OpenFAM error numbers
Fam Error Description
FAM_ERR_UNKNOWN Unexpected or Unknown errors.
FAM_ERR_NOPERM Caller does not have access rights for the desired operation.
FAM_ERR_TIMEOUT Blocking APIs reached retry/timeout limit.
FAM_ERR_INVALID APIs called with invalid options/arguments.
FAM_ERR_LIBFABRIC Libfabric API failure.
FAM_ERR_SHM Shared memory allocator error.
FAM_ERR_NOT_CREATED Data item or region creation in FAM failed.
FAM_ERR_NOTFOUND Data item or region not found in FAM.
FAM_ERR_ALREADYEXIST Data item or region already exists in FAM.
FAM_ERR_ALLOCATOR Allocator specific error
FAM_ERR_RPC Error from grpc layer.
FAM_ERR_PMI Runtime error.
FAM_ERR_OUTOFRANGE Data access out of range.
FAM_ERR_NULLPTR Null pointer access error.
FAM_ERR_UNIMPL Calling unimplemented functions/APIs.
FAM_ERR_RESOURCE Resource not available.
FAM_ERR_INVALIDOP Invalid operations
FAM_ERR_RPC_CLIENT_NOTFOUND RPC service not available.
FAM_ERR_MEMSERV_LIST_EMPTY Memory service not initialized.
FAM_ERR_METADATA Metadata service error.
FAM_ERR_MEMORY Memory service error.
FAM_ERR_NAME_TOO_LONG Region or Data item name too long.
FAM_ERR_ATL_QUEUE_FULL Atomic large transfer APIs queue full.
FAM_ERR_ATL_QUEUE_INSERT Atomic large transfer APIs queue insert error.
FAM_ERR_ATL_NOT_ENABLED Atomic large transfer APIs not enabled.
FAM_ERR_ATL Atomic large transfer API error.

Note that the library contains both blocking and non-blocking calls for most data path operations. In case of errors, all blocking calls throw exceptions immediately. For example, a call to fam_put_blocking() will either complete successfully, or throw Fam_Exception object containing one of the following error numbers - FAM_ERR_INVALID, FAM_ERR_OUTOFRANGE, FAM_ERR_NOTFOUND or FAM_ERR_TIMEOUT. In general, the application should use the normal try-catch block to handle exceptions:

    try {
    	fam_put_blocking();
    } catch (Fam_Exception &e) {
    	// Exception handling code
    }
    

However, the non-blocking calls are queued within the library, and may not catch exceptions immediately. Depending on the underlying error, a fam exception will be thrown immediately with specific error number, while others may only be thrown during the next fam_quiet() call. Thus the code may look like:

    try {
    	fam_put_nonblocking();
    } catch (Fam_Exception &e) { 
    	// handle error numbers(for example)
    	// FAM_ERR_INVALID, FAM_ERR_NOTFOUND, FAM_ERR_OUTOFRANGE, FAM_ERR_NOPERM
    } 
    // ... Continue rest of the code ...
    try {
    	fam_quiet();
    } catch (Fam_Exception &e) { 
    	// This exception may actually result from a previous 
    	// fam_put_nonblocking() operation with following error number
        // FAM_ERR_TIMEOUT, FAM_ERR_OUTOFRANGE, FAM_ERR_NOPERM
    }
    

Note that uncaught exceptions will result in the application being terminated.