본문 바로가기

카테고리 없음

DICOM Specific Character Set

http://www.dabsoft.ch/dicom/3/C.12.1.1.2/

C.12.1.1.2 Specific Character Set

Specific Character Set (0008,0005) identifies the Character Set that expands or replaces the Basic Graphic Set (ISO 646) for values of Data Elements that have Value Representation of SH, LO, ST, PN, LT or UT. See PS 3.5.

If the Attribute Specific Character Set (0008,0005) is not present or has only a single value, Code Extension techniques are not used. Defined terms for the Attribute Specific Character Set (0008,0005), when single valued, are derived from the International Registration Number as per ISO 2375 (e.g., ISO_IR 100 for Latin alphabet No. 1). See Table C.12-2.

Table C.12-2 DEFINED TERMS FOR SINGLE-BYTE CHARACTER SETS WITHOUT CODE EXTENSIONS

Character Set DescriptionDefined TermISO registration numberNumber of charactersCode elementCharacter Set
Default repertoirenoneISO-IR 694G0ISO 646
Latin alphabet No. 1ISO_IR 100ISO-IR 10096G1Supplementary set of ISO 8859
ISO-IR 694G0ISO 646
Latin alphabet No. 2ISO_IR 101ISO-IR 10196G1Supplementary set of ISO 8859
ISO-IR 694G0ISO 646
Latin alphabet No. 3ISO_IR 109ISO-IR 10996G1Supplementary set of ISO 8859
ISO-IR 694G0ISO 646
Latin alphabet No. 4ISO_IR 110ISO-IR 11096G1Supplementary set of ISO 8859
ISO-IR 694G0ISO 646
CyrillicISO_IR 144ISO-IR 14496G1Supplementary set of ISO 8859
ISO-IR 694G0ISO 646
ArabicISO_IR 127ISO-IR 12796G1Supplementary set of ISO 8859
ISO-IR 694G0ISO 646
GreekISO_IR 126ISO-IR 12696G1Supplementary set of ISO 8859
ISO-IR 694G0ISO 646
HebrewISO_IR 138ISO-IR 13896G1Supplementary set of ISO 8859
ISO-IR 694G0ISO 646
Latin alphabet No. 5ISO_IR 148ISO-IR 14896G1Supplementary set of ISO 8859
ISO-IR 694G0ISO 646
JapaneseISO_IR 13ISO-IR 1394G1JIS X 0201: Katakana
ISO-IR 1494G0JIS X 0201: Romaji
ThaiISO_IR 166ISO-IR 16688G1TIS 620-2533 (1990)
ISO-IR 694G0ISO 646

Note: To use the single-byte code table of JIS X0201, the value of attribute Specific Character Set (0008,0005), value 1 should be ISO_IR 13. This means that ISO-IR 13 is designated as the G1 code element which is invoked in the GR area. It should be understood that, in addition, ISO-IR 14 is designated as the G0 code element and this is invoked in the GL area.

If the attribute Specific Character Set (0008,0005) has more than one value, Code Extension techniques are used and Escape Sequences may be encountered in all character sets. Requirements for the use of Code Extension techniques are specified in PS 3.5. In order to indicate the presence of Code Extension, the Defined Terms for the repertoires have the prefix “ISO 2022”, e.g., ISO 2022 IR 100 for the Latin Alphabet No. 1. See Table 12-3 and Table 12-4. Table 12-3 describes single-byte character sets for value 1 to value n of the attribute Specific Character Set (0008,0005), and Table 12-4 describes multi-byte character sets for value 2 to value n of the attribute Specific Character Set(0008,0005).

Note: A prefix other than “ISO 2022” may be needed in the future if other Code Extension techniques are adopted.

The same character set shall not be used more than once in Specific Character Set (0008,0005).

Note: For example, the values “ISO 2022 IR 100\ISO 2022 IR 100” or “ISO_IR 100\ISO 2022 IR 100” are redundant and not permitted.

Table C.12-3DEFINED TERMS FOR SINGLE-BYTE CHARACTER SETS WITH CODE EXTENSIONS

Character Set DescriptionDefined TermStandard for Code ExtensionESC sequenceISO registration numberNumber of char-actersCode elementCharacter Set
Default repertoireISO 2022 IR 6ISO 2022ESC 02/08 04/02ISO-IR 694G0ISO 646
Latin alphabet No. 1ISO 2022 IR 100ISO 2022ESC 02/13 04/01ISO-IR 10096G1Supplementary set of ISO 8859
ISO 2022ESC 02/08 04/02ISO-IR 694G0ISO 646
Latin alphabet No. 2ISO 2022 IR 101ISO 2022ESC 02/13 04/02ISO-IR 10196G1Supplementary set of ISO 8859
ISO 2022ESC 02/08 04/02ISO-IR 694G0ISO 646
Latin alphabet No. 3ISO 2022 IR 109ISO 2022ESC 02/13 04/03ISO-IR 10996G1Supplementary set of ISO 8859
ISO 2022ESC 02/08 04/02ISO-IR 694G0ISO 646
Latin alphabet No. 4ISO 2022 IR 110ISO 2022ESC 02/13 04/04ISO-IR 11096G1Supplementary set of ISO 8859
ISO 2022ESC 02/08 04/02ISO-IR 694G0ISO 646
CyrillicISO 2022 IR 144ISO 2022ESC 02/13 04/12ISO-IR 14496G1Supplementary set of ISO 8859
ISO 2022ESC 02/08 04/02ISO-IR 694G0ISO 646
ArabicISO 2022 IR 127ISO 2022ESC 02/13 04/07ISO-IR 12796G1Supplementary set of ISO 8859
ISO 2022ESC 02/08 04/02ISO-IR 694G0ISO 646
GreekISO 2022 IR 126ISO 2022ESC 02/13 04/06ISO-IR 12696G1Supplementary set of ISO 8859
ISO 2022ESC 02/08 04/02ISO-IR 694G0ISO 646
HebrewISO 2022 IR 138ISO 2022ESC 02/13 04/08ISO-IR 13896G1Supplementary set of ISO 8859
ISO 2022ESC 02/08 04/02ISO-IR 694G0ISO 646
Latin alphabet No. 5ISO 2022 IR 148ISO 2022ESC 02/13 04/13ISO-IR 14896G1Supplementary set of ISO 8859
ISO 2022ESC 02/08 04/02ISO-IR 694G0ISO 646
JapaneseISO 2022 IR 13ISO 2022ESC 02/0 9 04/09ISO-IR 1394G1JIS X 0201: Katakana
ISO 2022ESC 02/08 04/10ISO-IR 1494G0JIS X 0201: Romaji
ThaiISO 2022 IR 166ISO 2022ESC 02/13 05/04ISO-IR 16688G1TIS 620-2533 (1990)
ISO 2022ESC 02/08 04/02ISO-IR 694G0ISO 646

Note: If the attribute Specific Character Set (0008,0005) has more than one value and value 1 is empty, it is assumed that value 1 is ISO 2022 IR 6.

Table C.12-4DEFINED TERMS FOR MULTI-BYTE CHARACTER SETS WITH CODE EXTENSIONS

Character Set DescriptionDefined TermStandard for Code ExtensionESC sequenceISO registration numberNumber of char-actersCode elementCharacter Set
JapaneseISO 2022 IR 87ISO 2022ESC 02/04 04/02ISO-IR 87942G0JIS X 0208: Kanji
ISO 2022 IR 159ISO 2022ESC 02/04 02/08 04/04ISO-IR 159942G0JIS X 0212: Supplementary Kanji set
KoreanISO 2022 IR 149ISO 2022ESC 02/04 02/09 04/03ISO-IR 149942G1KS X 1001: Hangul and Hanja

There are multi-byte character sets that prohibit the use of Code Extension Techniques. The Unicode character set used in ISO 10646, when encoded in UTF-8, and the GB18030 character set, encoded per the rules of GB18030, both prohibit the use of Code Extension Techniques. These character sets may only be specified as value 1 in the Specific Character Set (0008,0005) attribute and there shall only be one value. The minimal length UTF-8 encoding shall always be used for ISO 10646.

Notes: 1. The ISO standards for 10646 now prohibit the use of anything but the minimum length encoding for UTF-8. UTF-8 permits multiple different encodings, but when used to encode Unicode characters in accordance with ISO 10646-1 and 10646-2 (with extensions) only the minimal encodings are legal.

2. The representation for the characters in the DICOM Default Character Repertoire is the same single byte value for the Default Character Repertoire, ISO 10646 in UTF-8, and GB18030. It is also the 7-bit US-ASCII encoding.

Table C.12-5DEFINED TERMS FOR MULTI-BYTE CHARACTER SETS WITHOUT CODE EXTENSIONS

Character Set DescriptionDefined Term
Unicode in UTF-8ISO_IR 192
GB18030GB18030