OASIS Universal Business Language (UBL) TC

View Only

Back to discussions

Expand all | Collapse all

Whitespace of identifiers (

1. Whitespace of identifiers (

Recommend

Svante Schubert

Posted 10-09-2025 15:17

Dear members of the UBL Technical Committee,

Maybe for our next TC call, I would like to raise a subtle interoperability issue concerning whitespace handling in UBL identifiers - in particular, when carriage return and line feed characters occur within an identifier value. The case below illustrates a real-world example and leads to recommendations for both best practices and potential schema improvements.

1. Example under discussion

<cbc:ID>V00/4711007&#13;&#10;</cbc:ID>

Here, the identifier V00/4711007 is followed by a carriage return () and line feed (
). The distinction between XML parsing with or without validation is essential here.

2. How do different XML processors interpret this value

Processing mode	Value obtained	Explanation
Non-validating XML parser	`V00/4711007`	Character references expanded; CR/LF remain literal.
Schema-validating XML parser	`V00/4711007␣␣`	Under `xsd:normalizedString`, each CR and LF is replaced by a space (`#x20`).

UBL's cbc:ID is based on udt:IdentifierType, which derives from xsd:normalizedString.
According to W3C XML Schema Part 2, §4.3.6., the whitespace facet is "replace", meaning tabs, carriage returns, and line feeds are replaced by spaces - but not trimmed.

  <xs:simpleType name="normalizedString" id="normalizedString">     <xs:annotation>       <xs:documentation            source="http://www.w3.org/TR/xmlschema-2/#normalizedString"/>     </xs:annotation>     <xs:restriction base="xs:string">       <xs:whiteSpace value="replace" id="normalizedString.whiteSpace"/>     </xs:restriction>   </xs:simpleType> see https://www.w3.org/TR/xmlschema-2/#schema

As a result, the validator-normalised value differs from the raw Infoset. Depending on whether schema validation is active, systems may interpret the same document differently.

3. Technical validity vs. semantic validity

Syntactically valid XML: The document is well-formed; character references are allowed.
Valid per XSD: The schema allows it, because xsd:normalizedString replaces CR/LF with spaces.
Semantically unsafe: An identifier with trailing whitespace or control characters is ambiguous when being rendered and likely to fail matching or reconciliation.

4. Regulatory and interoperability implications

EU Regulation:
Under EN 16931 (the European standard for electronic invoices), identifiers such as the Invoice Number (BT-1) must uniquely and unambiguously identify an invoice.
In practice:

Invoice identifiers are expected to be exact strings without control characters or ambiguous whitespace.
Business systems (ERP, Peppol gateways, tax authorities) usually trim or reject identifiers containing CR/LF or trailing spaces.

URI / IRI standards:
Identifiers may also be reused as references (URIs or parts of URIs).
The URI specification RFC 3986, §2.2–2.4 explicitly forbids unescaped spaces and control characters in URIs.
Similarly, IRI syntax RFC 3987, §2.2 disallows unescaped whitespace and non-printable characters.
Thus, any identifier containing CR/LF or spaces is not valid as a URI or IRI

5. Recommendation for EN 16931 and UBL

(a) Short-term - best practice

UBL documentation and implementers' notes should clearly recommend that identifiers should not contain control characters (#x9, #xA, #xD) and/or multiple or leading/trailing whitespace.
Implementations should trim or collapse whitespace before business use.
If leading/trailing or multiple whitespace characters are encountered, processors should warn and normalise them to a canonical form.

(b) Long-term - schema improvement UBL 3.0

Redefine udt:IdentifierType in a future UBL version to derive from xsd:token instead of xsd:normalizedString.
xsd:token uses whiteSpace="collapse", which replaces sequences of whitespace with a single space and trims leading/trailing whitespace.
The example <cbc:ID>V00/4711007
</cbc:ID> would then normalise to exactly V00/4711007.
This removes ambiguity between validating and non-validating processors while preserving backward compatibility.

(c) Alignment for EN 16931

EN 16931 could explicitly narrow the allowed state that identifiers are semantically equivalent to xs:token lexical forms - disallowing control characters and ensuring canonical comparison.

6. Summary Table

Question	Answer	Comment
(a) Valid XML value?	Yes	Well-formed and schema-valid
(b) Validator value	`"V00/4711007 "`	CR/LF replaced by spaces
(c) Non-validating parser value	`"V00/4711007 "`	Literal control chars
(d) Semantically valid ID?	No	Ambiguous when rendered; not accepted for URI / Governmental IDs
(e) EN 16931 adaptation	Yes, request handling like for `xs:token`	Collapse, trim whitespace
(f) UBL next steps	Best practices now; revise type later	Improves reliability

7. Conclusion

While technically valid, identifiers containing CR/LF or other control characters are semantically unsafe. To ensure interoperability and alignment with EU and URI/IRI norms, I propose:

Short-term: Publish clear best practices discouraging control characters in identifiers.
Long-term: Adjust udt:IdentifierType to derive from xsd:token.
I would align as a CEN TC 434 editor, the EN 16931-3 guidance to ensure identifiers remain visually unambiguous.
I would especially aim for the values produced by non-validating parsers to be identical to those from validating parsers by explicitly defining the relevant facets (such as whitespace normalisation or default values) in the syntax-bindings.

I welcome your thoughts, experiences, or counterexamples on this topic.

Best regards,
Svante Schubert

2. RE: Whitespace of identifiers (

Recommend

Ken Holman

Posted 10-10-2025 01:41

>Dear members of the UBL Technical Committee,

Thank you for this interesting post, Svante.

>Maybe for our next TC call, I would like to raise a subtle interoperability issue concerning whitespace handling in UBL identifiers - in particular, when carriage return and line feed characters occur within an identifier value. The case below illustrates a real-world example and leads to recommendations for both best practices and potential schema improvements.
>
>1. Example under discussion
>
><cbc:id>V00/4711007 </cbc:id>
>
>Here, the identifier V00/4711007 is followed by a carriage return ( ) and line feed ( ). The distinction between XML parsing with or without validation is essential here.
>
>2. How do different XML processors interpret this value
>
>Processing mode Value obtained Explanation
>Non-validating XML parser V00/4711007 Character references expanded; CR/LF remain literal.
>Schema-validating XML parser V00/4711007 Under xsd:normalizedString, each CR and LF is replaced by a space (#x20).
>
>UBL's cbc:ID is based on udt:IdentifierType, which derives from xsd:normalizedString.

In turn, our UDT Identifier type is an unadulterated use of the UN/CEFACT CCTS 2.01 Core Component Type specification of the Identifier Type:

<xsd:complextype name="IdentifierType">
<xsd:annotation>
<xsd:documentation xml:lang="en">
<ccts:uniqueid>BDNDRUDT0000011</ccts:uniqueid>
<ccts:categorycode>UDT</ccts:categorycode>
<ccts:dictionaryentryname>Identifier. Type</ccts:dictionaryentryname>
<ccts:versionid>1.0</ccts:versionid>
<ccts:definition>A character string to identify and uniquely distinguish one instance of an object in an identification scheme from all other objects in the same scheme, together with relevant supplementary information.</ccts:definition>
<ccts:representationtermname>Identifier</ccts:representationtermname>
<ccts:primitivetype>string</ccts:primitivetype>
<ccts:usagerule>Other supplementary components in the CCT are captured as part of the token and name for the schema module containing the identifier list and thus, are not declared as attributes. </ccts:usagerule>
</xsd:documentation>
</xsd:annotation>
<xsd:simplecontent>
<xsd:extension base="ccts-cct:IdentifierType"></xsd:extension>
</xsd:simplecontent>
</xsd:complextype>

... which, in turn, specifies the use of xsd:normalizedString without any restricting facets:

<xsd:complextype name="IdentifierType">
<xsd:annotation>
<xsd:documentation xml:lang="en">
<ccts:uniqueid>UNDT000011</ccts:uniqueid>
<ccts:categorycode>CCT</ccts:categorycode>
<ccts:dictionaryentryname>Identifier. Type</ccts:dictionaryentryname>
<ccts:versionid>1.0</ccts:versionid>
<ccts:definition>A character string to identify and distinguish uniquely, one instance of an object in an identification scheme from all other objects in the same scheme together with relevant supplementary information.</ccts:definition>
<ccts:representationtermname>Identifier</ccts:representationtermname>
<ccts:primitivetype>string</ccts:primitivetype>
</xsd:documentation>
</xsd:annotation>
<xsd:simplecontent>
<xsd:extension base="xsd:normalizedString">
...

>According to W3C XML Schema Part 2, Â§4.3.6., the whitespace facet is "replace", meaning tabs, carriage returns, and line feeds are replaced by spaces - but not trimmed.
><xs:simpletype name="normalizedString" id="normalizedString"> <xs:annotation> <xs:documentation source="www.w3.org/TR/xmlschema-2/#normalizedString"></xs:documentation> </xs:annotation> <xs:restriction base="xs:string"> <xs:whitespace value="replace" id="normalizedString.whiteSpace"></xs:whitespace> </xs:restriction> </xs:simpletype> see www.w3.org/TR/xmlschema-2/#schema
>
>As a result, the validator-normalised value differs from the raw Infoset. Depending on whether schema validation is active, systems may interpret the same document differently.

Not all XML processing leverages the PSVI created through the use of an XSD schema. What comes to mind immediately is non-validated XSLT processing. UBL users processing their documents using XSLT are going to see the raw infoset.

>3. Technical validity vs. semantic validity
>
>Syntactically valid XML: The document is well-formed; character references are allowed.
>Valid per XSD: The schema allows it, because xsd:normalizedString replaces CR/LF with spaces.
>Semantically unsafe: An identifier with trailing whitespace or control characters is ambiguous when being rendered and likely to fail matching or reconciliation.

That may be true if the recipient using XPath fails to run normalize-space(.) on the identifier value.

Validation and processing of content is up to the recipient. The sender is responsible for taking the burden away from the recipient.

>4. Regulatory and interoperability implications
>
>EU Regulation:
>Under EN 16931 (the European standard for electronic invoices), identifiers such as the Invoice Number (BT-1) must uniquely and unambiguously identify an invoice.
>In practice:
>Invoice identifiers are expected to be exact strings without control characters or ambiguous whitespace.
>Business systems (ERP, Peppol gateways, tax authorities) usually trim or reject identifiers containing CR/LF or trailing spaces.

UBL's obligation is for the structure of invoices, not the content of invoices. A second pass validation reflecting user needs in advance of sending the invoice would be responsible for all business-related checking, including undesirable white space in identifiers if that is decided to be important.

Though I temper that statement with the normative conformance citation of UBL section 4 "Additional Document Constraints" where we layer on top of UBL UDT normative schema constraints on what constitutes valid content.

A good example is that CCTS does not constrain element content from being empty, yet UBL considers an empty element as a violation of UBL conformance. A document is not considered UBL valid if it is UBL schema valid and has empty elements.

>URI / IRI standards:
>Identifiers may also be reused as references (URIs or parts of URIs).
>The URI specification RFC 3986, Â§2.22.4 explicitly forbids unescaped sppaces and control characters in URIs.
>Similarly, IRI syntax RFC 3987, Â§2.2 disallows unescaped whitespace and non-printable characters.
>Thus, any identifier containing CR/LF or spaces is not valid as a URI or IRI

That puts an obligation on the sender to pre-validate their content before sending it.

>5. Recommendation for EN 16931 and UBL
>
>(a) Short-term - best practice
>UBL documentation and implementers' notes should clearly recommend that identifiers should not contain control characters (#x9, #xA, #xD) and/or multiple or leading/trailing whitespace.

Agreed.

>Implementations should trim or collapse whitespace before business use.
>If leading/trailing or multiple whitespace characters are encountered, processors should warn and normalise them to a canonical form.

Agreed.

>(b) Long-term - schema improvement UBL 3.0
>Redefine udt:IdentifierType in a future UBL version to derive from xsd:token instead of xsd:normalizedString.
>xsd:token uses whiteSpace="collapse", which replaces sequences of whitespace with a single space and trims leading/trailing whitespace.
>The example <cbc:id>V00/4711007 </cbc:id> would then normalise to exactly V00/4711007.
>This removes ambiguity between validating and non-validating processors while preserving backward compatibility.

Agreed for consideration for UBL 3.0 in light of what is chosen then as the core component type definitions upon which UBL 3.0 is built. But our hands are tied for UBL 2.x.

>(c) Alignment for EN 16931
>EN 16931 could explicitly narrow the allowed state that identifiers are semantically equivalent to xs:token lexical forms - disallowing control characters and ensuring canonical comparison.

My understanding is that other syntaxes for EN 19631 also are built on CCTS CCT. So it would seem to me that this is an argument for that specification for other syntaxes to build upon.

But, of course, the UBL committee hasn't even considered what may or may not be the basis of a UBL 3.0 schema specification.

>6. Summary Table
>
>QuestionAnswerComment
>(a) Valid XML value?YesWell-formed and schema-valid
>(b) Validator value"V00/4711007 "CR/LF replaced by spaces
>(c) Non-validating parser value"V00/4711007
>"Literal control chars
>(d) Semantically valid ID?NoAmbiguous when rendered; not accepted for URI / Governmental IDs
>(e) EN 16931 adaptationYes, request handling like for xs:tokenCollapse, trim whitespace
>(f) UBL next stepsBest practices now; revise type laterImproves reliability

Provided "later" is UBL 3.0+, I think this is sound guidance. Just not for UBL 2.x.

>7. Conclusion
>
>While technically valid, identifiers containing CR/LF or other control characters are semantically unsafe. To ensure interoperability and alignment with EU and URI/IRI norms, I propose:
>Short-term: Publish clear best practices discouraging control characters in identifiers.
>Long-term: Adjust udt:IdentifierType to derive from xsd:token.
>I would align as a CEN TC 434 editor, the EN 16931-3 guidance to ensure identifiers remain visually unambiguous.
>I would especially aim for the values produced by non-validating parsers to be identical to those from validating parsers by explicitly defining the relevant facets (such as whitespace normalisation or default values) in the syntax-bindings.
>
>I welcome your thoughts, experiences, or counterexamples on this topic.

An interesting analysis and set of guidelines going forward when the time comes to talk about UBL 3.0+.

But, perhaps, this is a bit of a distraction during the development of UBL 2.5+ and not, yet, appropriate for the next committee call. But, of course, I'm not in charge of the agenda, so I leave it with others to decide to include the discussion or not.

I think we have many UBL 3.0+ issues to consider in addition to this, we just haven't taken the time to enumerate them.

>Best regards,
>Svante Schubert

Thank you, again, Svante, for the interesting read!

. . . . . . . . Ken

--
Contact info, blog, articles, etc. http://www.CraneSoftwrights.com/m/ |
Check our site for free XML, XSLT, XSL-FO and UBL developer resources |
Streaming hands-on XSLT/XPath 2 training class @US$50 (5 hours free!) |
Essays (UBL, XML, etc.) http://www.linkedin.com/today/author/gkholman |

Original Message

Original Message:
Sent: 10/9/2025 3:17:00 PM
From: Svante Schubert
Subject: Whitespace of identifiers (

Dear members of the UBL Technical Committee,

1. Example under discussion

<cbc:ID>V00/4711007&#13;&#10;</cbc:ID>

Here, the identifier V00/4711007 is followed by a carriage return () and line feed (
). The distinction between XML parsing with or without validation is essential here.

2. How do different XML processors interpret this value

Processing mode	Value obtained	Explanation
Non-validating XML parser	`V00/4711007`	Character references expanded; CR/LF remain literal.
Schema-validating XML parser	`V00/4711007␣␣`	Under `xsd:normalizedString`, each CR and LF is replaced by a space (`#x20`).

  <xs:simpleType name="normalizedString" id="normalizedString">     <xs:annotation>       <xs:documentation            source="http://www.w3.org/TR/xmlschema-2/#normalizedString"/>     </xs:annotation>     <xs:restriction base="xs:string">       <xs:whiteSpace value="replace" id="normalizedString.whiteSpace"/>     </xs:restriction>   </xs:simpleType> see https://www.w3.org/TR/xmlschema-2/#schema

As a result, the validator-normalised value differs from the raw Infoset. Depending on whether schema validation is active, systems may interpret the same document differently.

3. Technical validity vs. semantic validity

Syntactically valid XML: The document is well-formed; character references are allowed.
Valid per XSD: The schema allows it, because xsd:normalizedString replaces CR/LF with spaces.
Semantically unsafe: An identifier with trailing whitespace or control characters is ambiguous when being rendered and likely to fail matching or reconciliation.

4. Regulatory and interoperability implications

EU Regulation:
Under EN 16931 (the European standard for electronic invoices), identifiers such as the Invoice Number (BT-1) must uniquely and unambiguously identify an invoice.
In practice:

Invoice identifiers are expected to be exact strings without control characters or ambiguous whitespace.
Business systems (ERP, Peppol gateways, tax authorities) usually trim or reject identifiers containing CR/LF or trailing spaces.

5. Recommendation for EN 16931 and UBL

(a) Short-term - best practice

UBL documentation and implementers' notes should clearly recommend that identifiers should not contain control characters (#x9, #xA, #xD) and/or multiple or leading/trailing whitespace.
Implementations should trim or collapse whitespace before business use.
If leading/trailing or multiple whitespace characters are encountered, processors should warn and normalise them to a canonical form.

(b) Long-term - schema improvement UBL 3.0

Redefine udt:IdentifierType in a future UBL version to derive from xsd:token instead of xsd:normalizedString.
xsd:token uses whiteSpace="collapse", which replaces sequences of whitespace with a single space and trims leading/trailing whitespace.
The example <cbc:ID>V00/4711007
</cbc:ID> would then normalise to exactly V00/4711007.
This removes ambiguity between validating and non-validating processors while preserving backward compatibility.

(c) Alignment for EN 16931

EN 16931 could explicitly narrow the allowed state that identifiers are semantically equivalent to xs:token lexical forms - disallowing control characters and ensuring canonical comparison.

6. Summary Table

Question	Answer	Comment
(a) Valid XML value?	Yes	Well-formed and schema-valid
(b) Validator value	`"V00/4711007 "`	CR/LF replaced by spaces
(c) Non-validating parser value	`"V00/4711007 "`	Literal control chars
(d) Semantically valid ID?	No	Ambiguous when rendered; not accepted for URI / Governmental IDs
(e) EN 16931 adaptation	Yes, request handling like for `xs:token`	Collapse, trim whitespace
(f) UBL next steps	Best practices now; revise type later	Improves reliability

7. Conclusion

While technically valid, identifiers containing CR/LF or other control characters are semantically unsafe. To ensure interoperability and alignment with EU and URI/IRI norms, I propose:

Short-term: Publish clear best practices discouraging control characters in identifiers.
Long-term: Adjust udt:IdentifierType to derive from xsd:token.
I would align as a CEN TC 434 editor, the EN 16931-3 guidance to ensure identifiers remain visually unambiguous.
I would especially aim for the values produced by non-validating parsers to be identical to those from validating parsers by explicitly defining the relevant facets (such as whitespace normalisation or default values) in the syntax-bindings.

I welcome your thoughts, experiences, or counterexamples on this topic.

Best regards,
Svante Schubert

</xsd:extension></xsd:simplecontent></xsd:complextype>

3. RE: Whitespace of identifiers (

Recommend

Ken Holman

Posted 10-10-2025 02:00

My response to Svante did not get distributed correctly in the emailed version (at least the copy that I received). The online version of my response at https://groups.oasis-open.org/discussion/whitespace-of-identifiers is intact (though I lost line-leading-white-space) and does not require a login.

------------------------------
Ken Holman
CTO
Crane Softwrights Ltd.
------------------------------

Original Message

4. RE: Whitespace of identifiers (

Recommend

Svante Schubert

Posted 10-10-2025 02:36

Hello Ken,

We should investigate how the UN/CEFACT CCTS 2.01 Core Component Identifier Type is mapped to an XSD data type.

A quick search in the CII D22B XSD files for "normalized" gave no hit, seems only xs:token is being used.

Kind regards,

Svante

Original Message

5. RE: Whitespace of identifiers (

Recommend

Kenneth Bengtsson

Posted 10-10-2025 06:19

Dear all,

If I understand the example correctly, then the user transmitted "<cbc:ID>V00/4711007
</cbc:ID>" but the actual identifier is "V00/4711007". The CR/LF is not semantically part of the identifier. Interoperability issues then arise because different XML processors try to fix the bad input in different ways. Is that interpretation correct?

If so, then I would argue that the problem here is data quality/bad input, not an interoperability defect in UBL. And yes, I would agree that we could (and should) address this in the specification, perhaps by warning against the use of control characters.

But I think there is an additional angle that we need to consider: What if the CR/LF were intentionally part of the identifier? Consider something like:

 <ID>Issuer type: ABC

   Value: 123

   Validation URI: https://validate.example.com/ABC/123

   Additional information: https://lookup.example.com/ABC/123</ID>

Is this an unthinkable identifier in the future? We already see things like this in QR payloads, composite identifiers, signed tokens, etc.

- UBL as currently defined cannot represent it, because xsd:normalizedString will silently replace the CR/LF with spaces.

- xsd:token would not help, only aggravate the problem by collapsing whitespace and lose even more information.

Is there any way to preserve the content except by using using xs:string as schema base type?

Best regards,

Kenneth

Este correo electrónico y cualquier archivo transmitido con él son propiedad de Efact S.A.C., contiene información confidencial, y están destinados exclusivamente al uso de la persona o entidad a la que van dirigidas. Si usted no es el destinatario señalado, no puede difundir y distribuir o copiar este e-mail. Por favor notifique inmediatamente al remitente por correo electrónico si usted ha recibido este correo electrónico por error, y eliminar este correo electrónico de su sistema. Si usted no es el destinatario, se le notifica que revelar, copiar, distribuir o tomar cualquier acción basada en el contenido de esta información está estrictamente prohibida.

This email and any files transmitted with it are the properties of Efact S.A.C., contains confidential information, and are intended solely for the use of the individual or entity to whom they are addressed. If you are not the named addressee you may not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake, and delete this e-mail from your system. If you are not the intended recipient you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited.

Original Message

6. RE: Whitespace of identifiers (

Recommend

Ken Holman

Posted 10-10-2025 08:23

The XML processor in an XSLT processor (or any other raw XML task) will convert the end-of-line sequences to line feeds, not spaces. If the end-of-line is a CR or a CR/LF pair, the processor returns a line feed without any distinction of its source. But one is able to detect the end of the line in a consistent fashion.

It is the PSVI (Post Schema Validation Infoset) that converts the end-of-line sequence to a space. Only when a process acts on the output of an XSD validating processor.

But if you don't use that, you get the ends of the lines.

------------------------------
Ken Holman
CTO
Crane Softwrights Ltd.
------------------------------

Original Message

7. RE: Whitespace of identifiers (

Recommend

Kenneth Bengtsson

Posted 10-11-2025 03:55

Thanks Ken, and you're right that the "space conversion" only happens in XSD-aware XML processors. But don't we have that already covered in the specification, including in the conformance clause? It states: "The UBL 2.4 XSD schemas [XSD1] [XSD2] are the only normative representations of the UBL 2.4 document types and library components for the purposes of XML document [XML] validation and conformance." with normative references to XML Schema Part 1 and 2.

From this I read that the XSD rules (including the whitespace facet of xs:normalizedString) are automatically normative for every datatype defined in the UBL XSDs.
Hence, UBL inherits deterministic whitespace normalization behavior from the W3C specification, i.e., an XML processor that doesn't normalize normalizedString is not conformant. As Svante suggested then we might limit the impact of non-conformant XML processors by recommending to not include control characters such as CR/LF in identifiers and codes.

Original Message

8. RE: Whitespace of identifiers (

Recommend

Ken Holman

Posted 10-11-2025 04:22

Thank you. I agree with you, Kenneth, with the only one-word edit being changing "an XML processor that doesn't normalize normalizedString is not conformant" to be "an XML application that doesn't normalize normalizedString is not conformant".

When the application uses the PSVI, it is conformant.

When the application doesn't use the PSVI, it is obligated to mimic the PSVI in this regard in order to be conformant.

It isn't the processor itself that is non-conformant, but the application that uses it.

------------------------------
Ken Holman
CTO
Crane Softwrights Ltd.
------------------------------

Original Message

9. RE: Whitespace of identifiers (

Recommend
Ken Holman
Posted 10-10-2025 08:01
At 10/10/2025 06:36 +0000, you wrote:
>Hello Ken, We should investigate how the UN/CEFACT CCTS 2.01 Core Component Identifier Type is mapped to an XSD data type.

Well, you could investigate my earlier response to you where I quoted the mapping of Identifier Type copied and pasted from the UN/CEFACT CCTS 2.01 Core Component schema file itself.

That file begins:
10. RE: Whitespace of identifiers (

Recommend
Ken Holman
Posted 10-10-2025 08:13
Again my post has been abbreviated. Let's see if it works just by poking it...

At 10/10/2025 12:01 +0000, Ken Holman via OASIS wrote:
>At 10/10/2025 06:36 +0000, you wrote: >Hello Ken, We should investigate how the UN/CEFACT CCTS 2.01 Core Component Identifier Type is mapped to an...
>
>
><https: groups.oasis-open.org communities community-home digestviewer?communitykey=556949c8-dac8-40e6-bb16-018dc7ce54d6>OASIS Universal Business Language (UBL) TC
>
>
>
><mailto:oasis-ubl@connectedcommunity.org>Post New Message
><https: groups.oasis-open.org discussion whitespace-of-identifiers#bmc7c390ad-e467-4155-b0a3-c16bc0291749>Re: Whitespace of identifiers (
><mailto:oasis_ubl_c7c390ad-e467-4155-b0a3-c16bc0291749@connectedcommunity.org?subject=re: whitespace of identifiers (>Reply to Group <mailto:gkholman@cranesoftwrights.com?subject=re: whitespace of identifiers (>Reply to Sender via Email<https: groups.oasis-open.org profile?userkey=7da22078-5dec-4d42-81f5-018dc8857ee9>
>Oct 10, 2025 8:01 AM
><https: groups.oasis-open.org profile?userkey=7da22078-5dec-4d42-81f5-018dc8857ee9>Ken Holman
>At 10/10/2025 06:36 +0000, you wrote:
>>Hello Ken, We should investigate how the UN/CEFACT CCTS 2.01 Core Component Identifier Type is mapped to an XSD data type.
>
>Well, you could investigate my earlier response to you where I quoted the mapping of Identifier Type copied and pasted from the UN/CEFACT CCTS 2.01 Core Component schema file itself.
>
>That file begins:
>
>
>
>
> <mailto:oasis_ubl_c7c390ad-e467-4155-b0a3-c16bc0291749@connectedcommunity.org?subject=re: whitespace of identifiers (>Reply to Group via Email <mailto:gkholman@cranesoftwrights.com?subject=re: whitespace of identifiers (>Reply to Sender via Email <https: groups.oasis-open.org discussion whitespace-of-identifiers#bmc7c390ad-e467-4155-b0a3-c16bc0291749>View Thread <https: groups.oasis-open.org:443 discussion whitespace-of-identifiers?messagekey=c7c390ad-e467-4155-b0a3-c16bc0291749&cmd=rate&cmdarg=add#bmc7c390ad-e467-4155-b0a3-c16bc0291749>Recommend <https: groups.oasis-open.org communities all-discussions forwardmessages?messagekey=c7c390ad-e467-4155-b0a3-c16bc0291749&ListKey=eb1069e3-ba66-4211-baba-018dce24c451>Forward
>
>You are subscribed to "OASIS Universal Business Language (UBL) TC" as gkholman@cranesoftwrights.com. To change your subscriptions, go to <http: oasis.connectedcommunity.org preferences?section=Subscriptions>My Subscriptions. To unsubscribe from this community discussion, go to <https: oasis.connectedcommunity.org higherlogic egroups unsubscribe.aspx?userkey=7da22078-5dec-4d42-81f5-018dc8857ee9&sKey=KeyRemoved&GroupKey=eb1069e3-ba66-4211-baba-018dce24c451>Unsubscribe.

--
Contact info, blog, articles, etc. http://www.CraneSoftwrights.com/o/ |
Check our site for free XML, XSLT, XSL-FO and UBL developer resources |
Streaming hands-on XSLT/XPath 2 training class @US$50 (5 hours free!) |
Essays (UBL, XML, etc.) http://www.linkedin.com/today/author/gkholman |

Original Message
11. RE: Whitespace of identifiers (

Recommend
Ken Holman
Posted 10-10-2025 08:19
Trying yet again, this time using the web interface. The issue is that this mail system cannot handle embedded angle brackets. Still don't know if it will work.

At 10/10/2025 06:36 +0000, you wrote:
> Hello Ken, We should investigate how the UN/CEFACT CCTS 2.01 Core Component Identifier Type is mapped to an XSD data type.

Well, you could investigate my earlier response to you where I quoted the mapping of Identifier Type copied and pasted from the UN/CEFACT CCTS 2.01 Core Component schema file itself.

That file begins:




<!--
Module of Core Component Type
Agency: UN/CEFACT
VersionID: 1.1
Last change: 14 January 2005

Copyright (C) UN/CEFACT (2006). All Rights Reserved.

You can find the file in its entirety at:

https://docs.oasis-open.org/ubl/os-UBL-2.4/xsd/common/BDNDR-CCTS_CCT_SchemaModule-1.1.xsd

... and look at line 391 again:

<xsd:extension base="xsd:normalizedString">

> A quick search in the CII D22B XSD files for "normalized" gave no hit, seems only xs:token is being used.

I don't see how that is relevant since it is a 2022 publication unrelated to CCTS 2.01 from 2005 when we started using the schema fragment in UBL.

> Kind regards,
> Svante

------------------------------
Ken Holman
CTO
Crane Softwrights Ltd.
------------------------------

Original Message

OASIS Universal Business Language (UBL) TC

Whitespace of identifiers (

Svante Schubert10-09-2025 15:17

Ken Holman10-10-2025 01:41

Ken Holman10-10-2025 02:00

Svante Schubert10-10-2025 02:36

Kenneth Bengtsson10-10-2025 06:19

Ken Holman10-10-2025 08:23

Kenneth Bengtsson10-11-2025 03:55

Ken Holman10-11-2025 04:22

Ken Holman10-10-2025 08:01

Ken Holman10-10-2025 08:13

Ken Holman10-10-2025 08:19

1. Whitespace of identifiers (

Dear members of the UBL Technical Committee,

1. Example under discussion

2. How do different XML processors interpret this value

3. Technical validity vs. semantic validity

4. Regulatory and interoperability implications

5. Recommendation for EN 16931 and UBL

6. Summary Table

7. Conclusion

2. RE: Whitespace of identifiers (

Dear members of the UBL Technical Committee,

1. Example under discussion

2. How do different XML processors interpret this value

3. Technical validity vs. semantic validity

4. Regulatory and interoperability implications

5. Recommendation for EN 16931 and UBL

6. Summary Table

7. Conclusion

3. RE: Whitespace of identifiers (

Dear members of the UBL Technical Committee,

1. Example under discussion

2. How do different XML processors interpret this value

3. Technical validity vs. semantic validity

4. Regulatory and interoperability implications

5. Recommendation for EN 16931 and UBL

6. Summary Table

7. Conclusion

4. RE: Whitespace of identifiers (

Dear members of the UBL Technical Committee,

1. Example under discussion

2. How do different XML processors interpret this value

3. Technical validity vs. semantic validity

4. Regulatory and interoperability implications

5. Recommendation for EN 16931 and UBL

6. Summary Table

7. Conclusion

5. RE: Whitespace of identifiers (

Dear members of the UBL Technical Committee,

1. Example under discussion

2. How do different XML processors interpret this value

3. Technical validity vs. semantic validity

4. Regulatory and interoperability implications

5. Recommendation for EN 16931 and UBL

6. Summary Table

7. Conclusion

6. RE: Whitespace of identifiers (

Dear members of the UBL Technical Committee,

1. Example under discussion

2. How do different XML processors interpret this value

3. Technical validity vs. semantic validity

4. Regulatory and interoperability implications

5. Recommendation for EN 16931 and UBL

6. Summary Table

7. Conclusion

7. RE: Whitespace of identifiers (

Dear members of the UBL Technical Committee,

1. Example under discussion

2. How do different XML processors interpret this value

3. Technical validity vs. semantic validity

4. Regulatory and interoperability implications

5. Recommendation for EN 16931 and UBL

6. Summary Table

7. Conclusion

8. RE: Whitespace of identifiers (

Dear members of the UBL Technical Committee,

1. Example under discussion

2. How do different XML processors interpret this value