OASIS ebXML Messaging Services TC

 View Only

Re: [ebxml-msg] reliable messaging

  • 1.  Re: [ebxml-msg] reliable messaging

    Posted 10-14-2002 12:32
     MHonArc v2.5.2 -->
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    

    ebxml-msg message

    [Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]


    Subject: Re: [ebxml-msg] reliable messaging



    Marty,

    ebMS *does* indeed provide such a status query. Granted that its required use in the
    failure mode you articulate is not specified (it could easily be). I do not believe that
    the protocol is necessarily broken in this regard, however it could certainly be reinforced
    and made more clear.

    I should also point out that no matter how hard one tries, it is impossible to close the
    loop entirely. If B never recovers, then A and B are permanently and unreconcilably
    out of synch w/r/t their shared understanding of the state of the exchange.

    Further comments below.

    Cheers,

    Christopher Ferris
    Architect, Emerging e-business Industry Architecture
    email: chrisfer@us.ibm.com
    phone: +1 508 234 3624


    Martin W Sachs/Watson/IBM@IBMUS wrote on 10/14/2002 11:46:01 AM:

    >
    >
    >
    >
    > It has been pointed out to me that ebXML reliable messaging is not reliable
    > under system failure.  At least one person who mentioned it considers ebXML
    > messaging to be broken as a result.  Here is a scenario:
    >
    > Party A send a message reliably to Party B.
    >
    > Party B's MSH receives and persists the message.
    >
    > Party B's MSH attempts to send the reliable-messaging acknowledgment but
    > Party B's system goes down before the acknowledgment gets on the wire.
    >
    > Party A exhausts its retries and concludes that the message was not
    > delivered.
    >
    > Party B eventually comes up and the destination application processes the
    > persisted message as prescribed in the MSG specification.
    >
    > Parties A and B are now out of sync with respect to that transaction and do
    > not know they are out of sync. Party A believes that the transaction
    > failed. Party B has in fact processed the message that it received from
    > Party A. Reliable messaging has failed to deliver on its promise.
    >
    > The solution to this problem is not trivial and the MSG team needs to give
    > it a lot of thought.  At a minimum, the following are needed in the spec:
    >
    > 1.  Both parties to the message exchange MUST persist enough state to allow
    > recovery and getting back in sync. Specific state variables must  be


    This is already prescribed in the spec.

    > prescribed.  They are at least those variables needed to restore the state
    > of the transaction and conversation after system recovery, such as the
    > conversation ID, CPA Id, service, action, and perhaps other parts of the
    > message header.
    >
    > 2. Timeouts and retries, as prescribed in the MSG spec, are not sufficient
    > to cover system failures since the failure could last a very long time.
    > Instead, if the party that sent the message doesn't receive a reply in a
    > reasonable time, it must be able to send a status query to the other party
    > and keep requesting status periodically until it receives a response.  The
    > status query protocol must be defined in the MSG specification. If the


    The protocol is defined, see section 7.

    > appropriate state information is persisted at both ends, when party B comes
    > up, it will receive and respond properly to the status query.  The timeouts
    > could be retained in the spec but their main use would be to signal the
    > "attached human" to make a phone call.


    That is always an option:)

    >
    > The MSG team should consider this a work item for version 3. Should the
    > team not wish to solve this problem, at the very least, a caveat should be
    > added to the MSG specification that messaging reliability under conditions
    > of system failure is outside the scope of the MSG team.


    Again, I believe that much of your concerns are already addressed. There is no
    doubt in my mind that they could be reinforced, making it abundantly clear
    to the reader.

    >
    > Regards,
    > Marty
    >
    >
    >
    > *************************************************************************************
    >
    > Martin W. Sachs
    > IBM T. J. Watson Research Center
    > P. O. B. 704
    > Yorktown Hts, NY 10598
    > 914-784-7287;  IBM tie line 863-7287
    > Notes address:  Martin W Sachs/Watson/IBM
    > Internet address:  mwsachs @ us.ibm.com
    > *************************************************************************************
    >
    >
    > ----------------------------------------------------------------
    > To subscribe or unsubscribe from this elist use the subscription
    > manager: <http://lists.oasis-open.org/ob/adm.pl>


    [Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]


    Powered by eList eXpress LLC