OpenFAM Reference Implementation

Error handling

Note that unlike C or C11, C++ provides native support for exceptions. Hence the C++ API adopts the C++ style of handling and reporting errors to its caller, using exceptions. Unlike the normal C convention of returning integer values (0 for success and negative values in case of failure), it uses C++ exceptions to indicate error scenarios. All APIs return the expected output on success (or void if no output is returned), and throw an exception on error. Instead of checking the return value for errors as in C, the caller should use try-catch blocks for handling errors.

The implementation defines a set of exception classes, which are derived from C++ standard exception class to categorize various types of failures, like Runtime, Allocator, Data-path, etc. It also defines a list of individual error numbers to identify a specific error. The exception object received by the caller in case of an error will contain a specific error number and an appropriate error message string. The application can retrieve this information using member functions, fam_error() and fam_error_msg()/what() and take any necessary action. The currently defined exceptions (and exception numbers) are defined in Table 1 and Table 2

Table 1: List of OpenFAM exceptions
Fam Exception Class Description
Fam_Exception This is the base exception class for all FAM exceptions.
Fam_InvalidOption_Exception For invalid options or arguments for FAM APIs.
Fam_Timeout_Exception Retry or Timeout limit for blocking APIs.
Fam_Datapath_Exception Indicate data path errors.
Fam_Allocator_Exception Indicate allocator errors.
Fam_Pmi_Exception Indicate runtime initialization errors.
Fam_Unimplemented_Exception Calling unimplemented OpenFAM APIs

Table 2: List of OpenFAM error numbers
Fam Error Description
FAM_ERR_UNKNOWN Unexpected or Unknown errors.
FAM_ERR_NOPERM Caller does not have access rights for the desired operation.
FAM_ERR_TIMEOUT Blocking APIs reached retry/timeout limit.
FAM_ERR_INVALID APIs called with invalid options/arguments.
FAM_ERR_LIBFABRIC Libfabric API failure.
FAM_ERR_SHM Shared memory allocator error.
FAM_ERR_NOTFOUND Data item or region not found in FAM.
FAM_ERR_ALREADYEXIST Data item or region already exists in FAM.
FAM_ERR_ALLOCATOR Allocator specific error
FAM_ERR_GRPC Error from grpc layer.
FAM_ERR_PMI Runtime error.
FAM_ERR_OUTOFRANGE Data access out of range.
FAM_ERR_UNIMPL Calling unimplemented functions/APIs.
FAM_ERR_RESOURCE Resource not available.
FAM_ERR_INVALIDOP Invalid operations

Note that the library contains both blocking and non-blocking calls for most data path operations. In case of errors, all blocking calls throw exceptions immediately. For example, a call to fam_put_blocking() will either complete successfully, or throw one of the following exceptions - Fam_InvalidOption_Exception, Fam_Allocator_Exception, Fam_Datapath_Exception or Fam_Timeout_Exception. In general, the application should use the normal try-catch block to handle exceptions:

    try {
    	fam_put_blocking();
    } catch (Fam_Exception &e) {
    	// Exception handling code
    }
    

However, the non-blocking calls are queued within the library, and may not catch exceptions immediately. Depending on the underlying error, certain exceptions will be thrown immediately, while others may only be thrown during the next fam_quiet() call. Thus the code may look like:

    try {
    	fam_put_nonblocking();
    } catch (Fam_Exception &e) { 
    	// handle (for example)
    	// Fam_InvalidOption_Exception, Fam_Allocator_Exception, Fam_Datapath_Exception
    } 
    // ... Continue rest of the code ...
    try {
    	fam_quiet();
    } catch (Fam_Exception &e) { 
    	// These exceptions may actually result
    	// from a previous fam_put_nonblocking() operation
      // Fam_Timeout_Exception, Fam_Datapath_Exception
    }
    

Note that uncaught exceptions will result in the application being terminated.