EM Messages and Notification SC

 View Only
  • 1.  Accents, characters, and unusual punctuation in CAP

    Posted 12-21-2006 20:46
    
    
    
    
    
    Question for those who use and implement CAP messaging; particularly those using it for implementations where the text data might be in a non-English language.
     
    We recently came upon an issue regarding character sets and language:
     
    Certain data was being being processed in our internal system Java as UTF-8 for languages that need at least UTF-16 to handle. This caused characters with accents common in Spanish or French to cause processing exceptions.  Since Java uses Unicode internally, the fix to allow accented characters is not hard. You just need to set a value in a couple of place in the code.
     
    But... It bring up a bigger question.  The language tag in the info block can be used to validate/determine how to read the data in Unicode in CAP messages written in languages than use non-Roman characters or unusual accents on Roman characters. This would make translation on the receiving end much simpler and more consistent. But, how about mixed information?  The simple example is Spanish or French place names in English where the accenting is not recognized.  A certain laxness in processing can handle that for the most part.  The more challenging case is something typical in Japan, for example, where the mixed use of character sets in written communication is quite common.  Japanese writing in Roman letters, but using some Japanese characters is one example. Another example is text in Japanese characters except that a non-Japanese place name is written in its native character set instead of, or as well as, its katakana (Japanese characters used for foreign words) representation.  I suspect that is might be the case in other languages as well.
     
    Question, should we validate info block content by language? Should we even process text content by language?  Or, is it just a translation problem on either end to be left to user systems?  (It may not be trivial.) 
     
    Respectfully,
     
    Gary A. Ham
    Battelle Memorial Institute
    External Systems Interoperability Coordinator
    Open Platform for Emergency Networks
    Disaster Management e-Gov Initiative
    Office for Interoperability and Compatibility
    Science and Technology
    Department of Homeland Security
    540-288-5611 (office)
    703-869-6241 (cell)
    "You would be surprised what you can accomplish when you do not care who gets the credit." - Harry S. Truman


  • 2.  RE: [emergency-msg] Accents, characters, and unusual punctuation in CAP

    Posted 12-26-2006 19:13
    
    
    
    
    
    
    
    
    
    
    
    

    I find that mixed languages is quite common. There are many non-English words that are used so frequently that they are accepted as defacto English (e.g. du jour, hors d'oeuvre, déjà vu, etc.) In my part of the world as in Canada, this is even more common with the concentration of a French-speaking population. We certainly wouldn’t want alerts & notifications to not go out because such words were included in the message.   

     

    IMHO,

     

    Patti

    Patti Iles Aymond, PhD
    Senior Scientist, Research & Development
    Innovative Emergency Management, Inc.
    Managing Risk in a Complex World

    8555 United Plaza Blvd.   Suite 100
    Baton Rouge, LA 70809
    (225) 952-8228 (phone)
    (225) 952-8122 (fax)


    From: Ham, Gary A [mailto:hamg@BATTELLE.ORG]
    Sent: Thursday, December 21, 2006 2:46 PM
    To: cap-list@lists.incident.com; emergency-msg@lists.oasis-open.org; dm-open-sig@list.dmi-services.org
    Subject: [emergency-msg] Accents, characters, and unusual punctuation in CAP

     

    Question for those who use and implement CAP messaging; particularly those using it for implementations where the text data might be in a non-English language.

     

    We recently came upon an issue regarding character sets and language:

     

    Certain data was being being processed in our internal system Java as UTF-8 for languages that need at least UTF-16 to handle. This caused characters with accents common in Spanish or French to cause processing exceptions.  Since Java uses Unicode internally, the fix to allow accented characters is not hard. You just need to set a value in a couple of place in the code.

     

    But... It bring up a bigger question.  The language tag in the info block can be used to validate/determine how to read the data in Unicode in CAP messages written in languages than use non-Roman characters or unusual accents on Roman characters. This would make translation on the receiving end much simpler and more consistent. But, how about mixed information?  The simple example is Spanish or French place names in English where the accenting is not recognized.  A certain laxness in processing can handle that for the most part.  The more challenging case is something typical in Japan, for example, where the mixed use of character sets in written communication is quite common.  Japanese writing in Roman letters, but using some Japanese characters is one example. Another example is text in Japanese characters except that a non-Japanese place name is written in its native character set instead of, or as well as, its katakana (Japanese characters used for foreign words) representation.  I suspect that is might be the case in other languages as well.

     

    Question, should we validate info block content by language? Should we even process text content by language?  Or, is it just a translation problem on either end to be left to user systems?  (It may not be trivial.) 

     

    Respectfully,

     

    Gary A. Ham

    Battelle Memorial Institute

    External Systems Interoperability Coordinator

    Open Platform for Emergency Networks

    Disaster Management e-Gov Initiative

    Office for Interoperability and Compatibility

    Science and Technology

    Department of Homeland Security

    540-288-5611 (office)

    703-869-6241 (cell)

    "You would be surprised what you can accomplish when you do not care who gets the credit." - Harry S. Truman

    IEM CONFIDENTIAL INFORMATION PLEASE READ OUR NOTICE:
    http://www.iem.com/e_mail_confidentiality_notice.html



  • 3.  RE: [emergency-msg] Accents, characters, and unusual punctuationin CAP

    Posted 12-27-2006 01:56
    +1 para Español.
    
    At 1:12 PM -0600 12/26/06, Aymond, Patti wrote:
    >I find that mixed languages is quite common. 
    >There are many non-English words that are used 
    >so frequently that they are accepted as defacto 
    >English (e.g. du jour, hors d'oeuvre, déjà vu, 
    >etc.) In my part of the world as in Canada, this 
    >is even more common with the concentration of a 
    >French-speaking population. We certainly 
    >wouldn't want alerts & notifications to not go 
    >out because such words were included in the 
    >message.   
    >
    >IMHO,
    >
    >Patti
    >Patti Iles Aymond, PhD
    >Senior Scientist, Research & Development
    >Innovative Emergency Management, Inc.
    >Managing Risk in a Complex World
    >8555 United Plaza Blvd.   Suite 100
    >Baton Rouge, LA 70809
    >(225) 952-8228 (phone)
    >(225) 952-8122 (fax)
    >
    >From: Ham, Gary A [mailto:hamg@BATTELLE.ORG]
    >Sent: Thursday, December 21, 2006 2:46 PM
    >To: cap-list@lists.incident.com; 
    >emergency-msg@lists.oasis-open.org; 
    >dm-open-sig@list.dmi-services.org
    >Subject: [emergency-msg] Accents, characters, and unusual punctuation in CAP
    >
    >Question for those who use and implement CAP 
    >messaging; particularly those using it for 
    >implementations where the text data might be in 
    >a non-English language.
    >
    >We recently came upon an issue regarding character sets and language:
    >
    >Certain data was being being processed in our 
    >internal system Java as UTF-8 for languages that 
    >need at least UTF-16 to handle. This caused 
    >characters with accents common in Spanish or 
    >French to cause processing exceptions.  Since 
    >Java uses Unicode internally, the fix to allow 
    >accented characters is not hard. You just need 
    >to set a value in a couple of place in the code.
    >
    >But... It bring up a bigger question.  The 
    >language tag in the info block can be used to 
    >validate/determine how to read the data in 
    >Unicode in CAP messages written in languages 
    >than use non-Roman characters or unusual accents 
    >on Roman characters. This would make translation 
    >on the receiving end much simpler and more 
    >consistent. But, how about mixed 
    >information?  The simple example is Spanish or 
    >French place names in English where the 
    >accenting is not recognized.  A certain laxness 
    >in processing can handle that for the most part. 
    >The more challenging case is something typical 
    >in Japan, for example, where the mixed use of 
    >character sets in written communication is quite 
    >common.  Japanese writing in Roman letters, but 
    >using some Japanese characters is one example. 
    >Another example is text in Japanese characters 
    >except that a non-Japanese place name is written 
    >in its native character set instead of, or as 
    >well as, its katakana (Japanese characters used 
    >for foreign words) representation.  I suspect 
    >that is might be the case in other languages as 
    >well.
    >
    >Question, should we validate info block content 
    >by language? Should we even process text content 
    >by language?  Or, is it just a translation 
    >problem on either end to be left to user 
    >systems?  (It may not be trivial.) 
    >
    >Respectfully,
    >
    >Gary A. Ham
    >Battelle Memorial Institute
    >External Systems Interoperability Coordinator
    >Open Platform for Emergency Networks
    >Disaster Management e-Gov Initiative
    >Office for Interoperability and Compatibility
    >Science and Technology
    >Department of Homeland Security
    >540-288-5611 (office)
    >703-869-6241 (cell)
    >"You would be surprised what you can accomplish 
    >when you do not care who gets the credit." - 
    >Harry S. Truman
    >IEM CONFIDENTIAL INFORMATION PLEASE READ OUR NOTICE:
    >