Unifoundry.com

Japanese Font Encodings


Home
GNU Unifont
Unicode Tutorial
Deckmate
Hangul Fonts
Japanese Fonts
Olden
Retro Fonts
Sequoyah
Unibetacode
Utf8gen
Utfcheck
Fontforge Poll
Downloads
Checking .sigs

Non-Unicode Japanese Font Encodings

Several encoding schemes for Japanese have been developed by the government of Japan, with some manufacturers adding their own non-standard extensions over the years. This page describes some of the most common standards, ending with a description of how those encodings map to Unicode fonts, and how one such Japanese font was added to Unifont.

Because Unicode code points are specified as hexadecimal numbers, this web page will give character assignments in a mixture of decimal and hexadecimal. Hexadecimal numbers will begin with "0x", matching the convention established in the C programming language. Hexadecimal Unicode code points will begin with "U+", matching the Unicode convention.

In the Beginning

Long before the advent of computers, telegraph operators in Japan transmitted Japanese using Wabun code (和文モールス符号), a series of short and long tones transmitted like Morse code. Wabun code used katakana, along with the Japanese comma and period (full stop). Two other sequences indicated a switch from International Morse code to Wabun code, and a switch from Wabun code back to International Morse code. According to Ken Lunde, the use of katakana by telegraph operators was the reason why the first Japanese computer encoding standard used katakana, with such standards work beginning in 1969 [Lunde, pp. 103, 144].

ASCII

ASCII interoperability has been a design consideration in Japanese fonts from the start. Hence a brief description of ASCII will help explain some later concepts on this web page.

ASCII assigns control codes, also referred to as the "C0" control codes, to byte values (code points) 0 through 31 (hexadecimal 0x00 through 0x1F), the space character to code point 32 (0x20), printable characters to code points 33 through 126 (0x21 through 0x7E), and the delete code to code point 127 (0x7F).

The printable characters, from 33 through 126, form a range of 94 characters. For this reason, many multi-byte Asian encoding schemes use groups of 94 glyphs. Japanese instances will be described later in this web page.

With ASCII only occupying the lower 128 code points of a byte, the upper 128 code points (128 through 255, or 0x80 through 0xFF) have been used for many other international encodings. Very often, the range 128 through 159 (0x80 through 0x9F) is reserved as the set of "C1" control codes, with code points 160 through 254 or 255 (0xA0 through 0xFE or 0xFF) being printable characters. Thus, often the term graphics left ("GL") is used to refer to the range of printable characters from 33 through 126, and the term graphics right ("GR") is used to refer to the range of printable characters from 160 through 254 or 255.

JIS X 0201 Encoding: a First Step

The first Japanese computer encoding standard only used one byte per character. It was originally named JIS C 6220. In 1987, it was renamed as JIS X 0201. Like Wabun code, this standard allowed a combination of the Latin alphabet and katakana.

JIS X 0201 assigns katakana characters and some punctuation marks to byte values (code points) 161 through 223 (hexadecimal 0xA1 through 0xDF). These characters have been directly mapped to Unicode, as the "Halfwidth CJK punctuation" and "Halfwidth Katakana variants" in the range U+FF61 through U+FF9F.

Code points from 0 through 127 (hexadecimal 0x00 through 0x7F) are reserved for ASCII, with two common exceptions:

Code point 127 (0x7F) is reserved for the Delete code.

JIS X 0201 is specified for use on seven- and eight-bit systems. Seven-bit systems were more common in the days of serial modems and RS-232 connections that used the eighth bit for parity. With today's Internet, eight-bit systems are all but universal.

This standard left much of the eight-bit space of 256 code points undefined, which allowed for expansion to add support for hiragana and also for kanji, the Japanese ideographs. Because there are thousands of kanji characters, at least a two-byte code is required to cover the entire set. The next section describes such a code.

JIS X 0208 Encoding

Recall from the ASCII section above that there are 94 printable ASCII characters, in the range 33 through 126 (hexadecimal 0x21 through 0x7E). JIS X 0208 specified an encoding matrix of 94 rows, with each row containing 94 cells. When the government of Japan created this format, they referred to this 94 row by 94 cell arrangement as 区点 (kuten), from 区 (ku, meaning "ward" or "section") and 点 (ten, meaning "point"). The common English usage today is to refer to this 94 ku by 94 ten array as 94 rows by 94 cells.

The code points in the first row of JIS X 0208 are numbered 1-1 through 1-94, code points in the second row are numbered 2-1 through 2-94, and so on, up to 94-94. This allows up to 94 × 94 = 8836 character assignments.

In JIS X 0208, rows 1 through 15 are used to encode non-kanji characters; rows 9 through 15 have not been assigned characters. Rows 16 through 47 are used to encode Level 1 kanji, which Japanese children learn in school. Rows 48 through 84 are used to encode Level 2 kanji. Rows 85 through 94 are currently unused.

The two-byte JIS codes can be mapped to byte sequences in several ways. A very popular method (for a while, the most popular method) allowed JIS X 0201 characters to be combined with JIS X 0208 characters. This method is called Shift-JIS, and is defined in the JIS X 0208 standard. The next section provides a brief overview.

Shift-JIS Encoding

JIS X 0208 specifies a way for its 94-by-94 encoding to mix with JIS X 0201 encoding. JIS X 0201 codes remain the same: code points 0 through 127 are unchanged, as are the punctuation codes and katakana at code points 161 through 223.

A two-byte JIS X 0208 sequence encoded using Shift-JIS has a value in the ranges 129 through 159 (0x81 through 0x9F) or 224 through 239 (0xE0 through 0xEF). The second byte will have a value in the ranges 64 through 126 (0x40 through 0x7E) or 128 through 252 (0x80 through 0xFC). Code point 127 (0x7F), the delete code point, is skipped over, as is the JIS X 0201 punctuation and katakana range of 161 through 223 (0xA1 through 0xDF). Thus Shift-JIS allows intermingling JIS X 0201 codes with JIS X 0208 codes in the same document.

Some descriptions of Shift-JIS present mathematical formulas for mapping a two-byte JIS X 0208 code. The purpose of those formulas is to skip over the delete character (127, or 0x7F) always, and to skip over the range 161 through 223 (0xA1 through 0xDF) on the first byte. They first add 32 (hexadecimal 0x20) to the row and cell numbers of a code, so that row and cell numbers range from 33 through 126 instead of 1 through 94. A (hopefully) more intuitive description of this mapping follows instead of just repeating formulas.

Shift-JIS and FONTX2 Fonts

In 1990, IBM Japan introduced the DOS/V operating system for the IBM PC/AT to support kanji on a PC with a VGA graphics display. Originally, "DOS/V" was an abbreviation for "DOS/VGA". Later, as PC graphics cards provided resolutions beyond the 640 × 480 pixel resolution of VGA graphics controllers, the "V" was said to stand for "Variable".

In November 1991, Takashi Oyama (小山 隆史, handle "lepton") published FONTX version 1.1 as a font driver to replace the DOS/V font driver. It allowed different Shift-JIS encoded kanji fonts to be displayed under DOS/V. In January 1992, he published version 2.0 of the driver, FONTX2. This became the most popular kanji font encoding format under DOS/V.

Many legacy Japanese FONTX2 font files are still available on the Internet today and the format remains in use for displaying Shift-JIS encoded kanji on embedded systems.

FONTX2 files have two formats: a small file format for single byte fonts, and a large file format for double byte Shift-JIS fonts. The tables below provide an overview of these formats.

FONTX2 Single Byte Format
First
Byte
Byte
Length
Contents
06 "FONTX2" in ASCII
68 Font Name in ASCII
141 Glyph Width (pixels), ncols
151 Glyph Height (pixels), nrows
161 Code Flag:
0 for Alphanumeric Keyboard
17 Bitmaps for up to 256
left-justified glyphs,
8 pixels/byte

Glyphs in a FONTX2 file contain a whole number of bytes per row, with pixels left-justified to the highest bit in the last byte in each row if the glyph width is not a multiple of 8 pixels. Thus the size of the glyph bitmap section of a single byte FONTX2 font file, with 256 glyphs per font, is

256 × nrows × ⌈ ncols / 8 ⌉ bytes.

For example, a single byte FONTX2 file containing 16 × 16 pixel glyphs would have a bitmap size of

256 × 16 × ⌈ 16 / 8 ⌉ = 256 × 16 × 2 = 4096 bytes.

The total file size would be 17 header bytes + 4096 glyph bitmap bytes = 4113 bytes.

The format for a double byte Shift-JIS FONTX2 font file is as follows:

FONTX2 Double Byte (Shift-JIS) Format
First
Byte
Byte
Length
Contents
06 "FONTX2" in ASCII
68 Font Name in ASCII
141 Glyph Width (pixels), ncols
151 Glyph Height (pixels), nrows
161 Code Flag:
1 for Shift-JIS
171 Number of Code Blocks, nb
182 Code Block 1 Start
202 Code Block 1 End
14 + (4 × nb)2 Code Block nb Start
16 + (4 × nb)2 Code Block nb End
18 + (4 × nb) Start of Glyph Bitmaps

The two-byte Code Block Start and Code Block End fields are stored in little-endian order: low-order byte first, high-order byte second. Glyph bitmaps are stored one byte at a time, so each bitmap is read in the order that bytes appear in the file.

As with single byte FONTX2 files, the size of each glyph is

nrows × ⌈ ncols / 8 ⌉ bytes,

and so the total file size of a double byte FONTX2 font file is

18 + (4 × Number of Blocks) + (Number of Glyphs × nrows × ⌈ ncols / 8 ⌉) bytes.

The glyph bitmaps follow one after the other, with no gaps between block boundaries. Dividing glyphs into runs of blocks allows skipping over the large sections of a two byte code space that contain no Shift-JIS encodings. These two qualities make FONTX2 a very space-efficient Shift-JIS font format.

FONTX2 to BDF Font Conversion

There are numerous Japanese fonts in the MS-DOS FONTX2 format that are in the public domain. This motivated me to create conversion software to allow their use on other systems. I chose BDF as the initial output format because of its support on many Unix-like platforms and embedded systems, and because of its flexibility in supporting bitmap glyphs of various sizes.

The following tarball contains the program fontx2bdf, which will read a single- or double-byte format FONTX2 file and output a BDF version 2.1 font file. The ".sig" file is the GPG signature for the tarball. Those code points in the input file that have valid mappings to Unicode Plane 0 are included in the BDF output. The program is licensed under GPL version 2, and the resulting header file for mapping JIS X code points to Unicode code points is still in the public domain.

fontx2bdf-1.1.tar.gz
fontx2bdf-1.1.tar.gz.sig

The header file, jisx2uni.h, is an expanded version of the file eucjp2uni.h (which is linked towards the end of this page). I renamed the file and its arrays to better reflect that the mapping arrays are ordered for JIS X encodings rather than in any particular information interchange scheme such as EUC (Extended Unix Code). This header file contains three new arrays for mapping single- and double-byte FONTX2 font files to Unicode code points.

Below is a graphic of such a one-byte FONTX2 font, dflhn16x.fnt. This can be converted to a BDF font with this command:

      fontx2bdf dflhn16x.fnt dflhn16x.bdf

Note that the glyph in position 0x5C is a backslash, not a Yen symbol. Some Japanese fonts map 0x5C to the Yen symbol, such as those encoded with IBM's Code Page 1041. Keeping the backslash in its ASCII position preserves its use as the DOS directory separator in pathnames and as the character that precedes control codes in C program syntax (for example, '\n' for "newline").

DR F ANK Font


The original FONTX2 files from December 2001 are in the tarball below, with the SHA-256 checksum:

      db5a2203bf08aaa089afa756a7afddeeeb245b281f78529c7d442f98298d1ed5  dflfnt.tar.gz

The font name appearing within the font file is "DR F ANK". "ANK" is a common abbreviation for a single-byte Japanese font that contains Alphabet, Numerals, and Katakana.

JIS X 0213 Encoding

Although JIS X 0208 can encode over 8000 characters, this is not enough to encode all existing kanji. In the year 2000, JIS X 0213 was released with the addition of planes, or in Japanese 面 (men, meaning "surface"), to the existing row-cell ordering. The pre-existing JIS X 0208 row-cell arrangement was preserved in Plane 1; new characters are assigned to Plane 2. Each of the two JIS X 0213 planes contains a code point space of 94 rows with 94 cells per row.

JIS X 0213 was revised in 2004 and in 2012, but preserving the original two plane encoding notation. The 2004 version modified the glyphs of 168 kanji.

A JIS X 0213 code point is often written as plane-row-cell; for example, 1-1-1 represents the first code point in Plane 1 and 2-94-94 represents the last code point in Plane 2. Currently, Plane 2 only has character assignments in 25 of its 94 rows: 1, 3 – 5, 8, 12 – 15, and 78 – 94. This is significant in Shift-JIS encoding of JIS X 0213, described in the next section.

Shift-JIS for JIS X 0213 Encoding

A JIS X 0213 code point in Plane 1 has the same Shift-JIS encoding order and values that a JIS X 0208 code point would have. Plane 2 code points map to a first Shift-JIS byte in the range 240 – 252 (0xF0 – 0xFC). The second Shift-JIS byte is encoded the same for Plane 2 as for Plane 1:

The mapping of the first byte in a two-byte Shift-JIS sequence for Plane 2 code points is not entirely in order. This is an artifact of nearing the end of available Shift-JIS codes and wanting to calculate the first byte value by a mathematical formula:

Shift-JIS First & Second Bytes for
JIS X 0213 Plane 2
JIS X 0213 Row
(Plane 2)
Shift-JIS First ByteSecond Byte
Range Half
DecimalDecimalHexadecimalLowerUpper
12400xF0X
32410xF1X
42410xF1 X
52420xF2X
82400xF0 X
122420xF2 X
132430xF3X
142430xF3 X
152440xF4X
782440xF4 X
792450xF5X
802450xF5 X
812460xF6X
822460xF6 X
832470xF7X
842470xF7 X
852480xF8X
862480xF8 X
872490xF9X
882490xF9 X
892500xFAX
902500xFA X
912510xFBX
922510xFB X
932520xFCX
942520xFC X

Although not entirely straightforward, the Shift-JIS mapping of a JIS X 0213 Plane 2 code point still maps one odd row and one even row to the same first byte value, as with the one-plane JIS X 0208 standard.

The Shift-JIS two-byte encoding of JIS X 0213 allows up to 15 + 16 + 16 + 13 = 60 values for the first byte of a two-byte Shift-JIS encoded JIS X 0213 code point, and the same 188 values for a second byte as Shift-JIS encoding of JIS X 0208. This is a total of 60 × 188 = 11,280 possible code points. Even if this were extended to allow first byte values of 253 through 255 (0xFD through 0xFF), that would only add 3 × 188 = 564 more code points, for a total of 11,844 possible total code points.

Thus Shift-JIS encoding is approaching its practial limit. It cannot encode all possible code points of a two plane, 94 row by 94 cell encoding such as JIS X 0213 in two bytes.

Two other non-Unicode encoding methods were introduced in 1994 that do not have this limitation: ISO-2022 and Extended Unix Code. They are described in the next two sections.

ISO-2022 and Japanese Font Encoding

By the 1990s, many computer encoding schemes had evolved to represent various languages. The most basic of these used the unassigned upper range of byte values, 128 through 255 (0x80 through 0xFF). JIS X 0201, which assigned a set of Japanese punctuation marks and katakana to the range 161 – 223 (0xA1 – 0xDF) was one such encoding method. The row-cell arrangement of two-byte character sets, with rows and cells ranging from 1 – 94, was an ordering of characters that could be mapped to one or more computer encodings.

ISO-2022 was released in 1994 to address two problems involving international character sets, including multi-byte Asian scripts:

ISO-2022 can transmit all character sets using only seven-bit codes. This makes it suitable for communication between systems that may use the eighth bit for parity, as was common in the days of dial-up modems and RS-232 connections. It does this by using different escape sequences and control codes to indicate a shifting from one character set type or range to another. It encodes 94 × 94 element row-cell character sets such as the two-byte JIS codes by adding 32 to each row or cell value (0x20 in hexadecimal). This maps all 94 × 94 row-cell bytes to the range of printable seven-bit characters: 33 – 126 (0x21 – 0x7E).

The ISO-2022 assignment of row and cell character values to byte ranges 33 – 126 (0x21 – 0x7E) is also how two-byte Japanese fonts are commonly encoded. Fonts supporting the two-plane JIS X 0213 standard may place Plane 1 and Plane 2 glyphs in two separate files, or in a single file with Plane 1 glyphs starting with a plane digit of 1 or 3, and with Plane 2 glyphs starting with a plane digit of 2 or 4.

The main contents of a two-byte JIS encoding are the kanji characters. In JIS X 0213, these begin at plane 1, row 16. In font encodings that follow the ISO-2022 convention, this kanji starting point corresponds to 0x3000. The first 15 rows of Plane 1 contain non-kanji characters.

JIS X 0213 Plane 1, rows 16 through 47 contain Level 1 kanji, which school children learn. These glyphs are in locations 0x3000 through 0x4F7E in a JIS X 0213 Plane 1 font that uses the ISO-2022 encoding convention.

Plane 1, rows 48 through 47 contain Level 2 kanji. These glyphs are in locations 0x5000 through 0x72FF.

The remaining rows after Plane 1, row 84 contain less frequently used kanji.

JIS X 0213:2004 Plane 1 has glyphs assigned for all possible code points. Plane 2 has glyphs assigned in the following rows:

JIS X 0213:2004
Plane 2 Rows Assigned
Ordinal
Row
Font Encoding Row
DecimalDecimalHexadecimal
1330x21
3 – 535 – 370x23 – 0x25
8400x28
12 – 1544 – 470x2C – 0x2F
68 – 94110 – 1260x6E – 0x7E

The two tables below contain links to JIS X 0213 glyph maps, created with the Unifont unihex2png program. The first character (in Plane 1, location 0x2121) is a space character, so its cell appears blank.

JIS X 0213:2004 Glyphs
Plane 1
(All cells have glyph assignments)
  21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
60 61 62 63 64 65 66 67 68 69 6A 6B 6C 6D 6E 6F
70 71 72 73 74 75 76 77 78 79 7A 7B 7C 7D 7E  

Plane 2 does not have characters assigned to all locations. Those rows with assignments are colored green.

JIS X 0213:2004 Glyphs
Plane 2
(Gray cells have no glyph assignments)
  21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
60 61 62 63 64 65 66 67 68 69 6A 6B 6C 6D 6E 6F
70 71 72 73 74 75 76 77 78 79 7A 7B 7C 7D 7E  

ISO-2022 provided a standard method for exchange of seven-bit characters between computers, notably for email. However, the base standard, with its multi-byte shift codes, is not an efficient method for computer information storage, so it is not described further. ISO-2022 does allow for an eight-bit mode popular for storing Asian character sets, known as Extended Unix Code. This is described in the next section.

Extended Unix Code-JP Encoding

The Extended Unix Code can map to many Asian glyph encodings that use planes of 94 by 94 characters. EUC-JP is the standard for encoding Japanese, and it fully supports JIS X 0213.

Regardless of the character set being encoded, EUC reserves decimal code points 0 – 31 (C0 control codes), 32 (space), 127 (delete), and 128 – 159 (C1 control codes). In hexadecimal, 0 – 32 is 0x00 – 0x20 and 127 – 159 is 0x7F – 0x9F.

EUC-JP uses the range 0x21 – 0x7E for ASCII characters, although some EUC-JP documents use the ASCII code point for the backslash character ("\"), 0x5C, for the Yen symbol and the code point for the tilde character ("~"), 0x7E, for the Overline character (Unicode code point U+203E). This ASCII range is referred to as EUC Code Set 0, and was also covered by the older Japanese standard JIS X 0201-ro.

EUC-JP adds 128 (hexadecimal 0x80) to each ISO-2022 multi-byte code point (which will have a range of 33 through 126, or 0x20 through 0x7E), so resulting byte values will lie above the ASCII range of 0x00 – 0x7F, and above the range 0x80 – 0x9F reserved for control codes. Thus JIS X 0213 code points in each plane ranging from 0x2121 through 0x7E7E using ISO-2022 encoding, when 0x8080 is added, will have EUC-JP encoding range hexadecimal 0xA1A1 through 0xFEFE.

Following this arrangement, EUC-JP encodes JIS X 0213 Plane 1 with a two-byte sequence in the range 0xA1A1 through 0xFEFE.

EUC-JP also adds 0x8080 to JIS X 0213 Plane 2 glyphs, and precedes those glyphs with the byte 0x8F. This sequence was also used for the older JIS X 0212 standard.

Finally, EUC-JP indicates the half-width katakana set from JIS X 0201 with byte value 0x8E followed by their single byte values of 0xA1 through 0xDF.

The following table summarizes these EUC-JP code sets:

EUC
Code Set
Bytes
per Code
Byte SequenceStandard(s)
0 1 0x00–0x7F ASCII,
JIS X 0201-ro
1 2 0xA1–0x7E, 0xA1–0x7E JIS X 0208,
JIS X 0213 Plane 1
2 2 0x8E, 0xA1–0xDF JIS X 0201-jp
Punctuation & Half-width Hiragana
3 3 0x8F, 0xA1–0x7E, 0xA1–0x7E JIS X 0212,
JIS X 0213 Plane 2

Mapping JIS X 0213 / EUC-JP to Unicode

As discussed in the last section, all JIS X 0213 code points map to a byte sequence in the EUC-JP encoding. These code points also map to Unicode.

Most JIS X 0213 characters map to code points in Unicode Plane 0, the Basic Multilingual Plane (BMP). However, 303 characters from JIS X 0213 Plane 2 map to Unicode Plane 2, the Supplementary Ideographic Plane (SIP). Therefore, to support the entire JIS X 0213 repertoire, a font must support some glyphs in Unicode Plane 2.

As a further twist, twenty-five JIS X 0213 code points in JIS Plane 1 are a combination of two Unicode code points, and so cannot be mapped into a single Unicode code point. They are as follows:

JIS Plane 1
Code Point
Jiskan 16
Bitmap
Unicode
Code Points
2477 1-2477 U+304B + U+309A
2478 1-2478 U+304D + U+309A
2479 1-2479 U+304F + U+309A
247A 1-247A U+3051 + U+309A
247B 1-247B U+3053 + U+309A
2577 1-2577 U+30AB + U+309A
2578 1-2578 U+30AD + U+309A
2579 1-2579 U+30AF + U+309A
257A 1-257A U+30B1 + U+309A
257B 1-257B U+30B3 + U+309A
257C 1-257C U+30BB + U+309A
257D 1-257D U+30C4 + U+309A
257E 1-257E U+30C8 + U+309A
2678 1-2678 U+31F7 + U+309A
2B44 1-2B44 U+00E6 + U+0300
2B48 1-2B48 U+0254 + U+0300
2B49 1-2B49 U+0254 + U+0301
2B4A 1-2B4A U+028C + U+0300
2B4B 1-2B4B U+028C + U+0301
2B4C 1-2B4C U+0259 + U+0300
2B4D 1-2B4D U+0259 + U+0301
2B4E 1-2B4E U+025A + U+0300
2B4F 1-2B4F U+025A + U+0301
2B65 1-2B65 U+02E9 + U+02E5
2B66 1-2B66 U+02E5 + U+02E9

Note: Japanese OpenType fonts will recognize the two combinations of U+02E5 and U+02E9, and render glyphs that resemble the JIS X 0213 Plane 1 glyphs 0x2B65 and 0x2B66. Bitmapped fonts (BDF, PCF, etc.) and simple TrueType fonts (such as GNU Unifont) will render the two Unicode glyphs separately.

JIS X 0213 Unicode Coverage

The following are the Unicode ranges covered by the public domain Jiskan16 JIS X 0213 font [table created with the help of the Unifont unicoverage utility]:

Percent
Covered      Range       Script
-------      -----       ------
  71.1%  U+0080..U+00FF  C1 Controls and Latin-1 Supplement
  60.2%  U+0100..U+017F  Latin Extended - A
   7.2%  U+0180..U+024F  Latin Extended - B
  57.3%  U+0250..U+02AF  IPA Extensions
  18.8%  U+02B0..U+02FF  Spacing Modifier Letters
  28.6%  U+0300..U+036F  Combining Diacritical Marks
  34.0%  U+0370..U+03FF  Greek and Coptic
  25.8%  U+0400..U+04FF  Cyrillic
   0.8%  U+1E00..U+1EFF  Latin Extended Additional
   1.6%  U+1F00..U+1FFF  Greek Extended
  21.4%  U+2000..U+206F  General Punctuation
   2.1%  U+20A0..U+20CF  Currency Symbols
  10.0%  U+2100..U+214F  Letterlike Symbols
  42.2%  U+2150..U+218F  Number Forms
  14.3%  U+2190..U+21FF  Arrows
  21.5%  U+2200..U+22FF  Mathematical Operators
   7.8%  U+2300..U+23FF  Miscellaneous Technical
   1.6%  U+2400..U+243F  Control Pictures
  41.2%  U+2460..U+24FF  Enclosed Alphanumerics
  25.0%  U+2500..U+257F  Box Drawing
  24.0%  U+25A0..U+25FF  Geometric Shapes
  10.9%  U+2600..U+26FF  Miscellaneous Symbols
   6.2%  U+2700..U+27BF  Dingbats
   1.6%  U+2900..U+297F  Supplemental Arrows - B
   2.3%  U+2980..U+29FF  Miscellaneous Mathematical Symbols - B
  54.7%  U+3000..U+303F  CJK Symbols and Punctuation
  94.8%  U+3040..U+309F  Hiragana
 100.0%  U+30A0..U+30FF  Katakana
 100.0%  U+31F0..U+31FF  Katakana Phonetic Extensions
  24.6%  U+3200..U+32FF  Enclosed CJK Letters and Months
  11.3%  U+3300..U+33FF  CJK Compatibility
   2.5%  U+3400..U+4DBF  CJK Unified Ideographs Extension A
  45.3%  U+4E00..U+9FFF  CJK Unified Ideographs
  16.0%  U+F900..U+FAFF  CJK Compatibility Ideographs
   4.7%  U+FB50..U+FDFF  Arabic Presentation Forms - A
   6.2%  U+FE30..U+FE4F  CJK Compatibility Forms
  42.1%  U+FF00..U+FFEF  Halfwidth and Fullwidth Forms
   0.7%  U+20000..U+2A6FF  CJK Unified Ideographs Extension B

JIS X 0213 Fonts and GNU Unifont

Jiskan 16

Unifont version 12.1.02 added a unifont_jp variant of the main Unicode Basic Multilingual Plane (Plane 0) font. This substitutes the default Unifont CJK glyphs with glyphs from the public domain Jiskan16 BDF font in these Unicode ranges:

Unicode RangeUnicode Script
U+3300..U+33FFCJK Compatibility
U+3400..U+4DBFCJK Unified Ideographs Extension A
U+4E00..U+9FFFCJK Unified Ideographs
U+F900..U+FAFFCJK Compatibility Ideographs
U+FE30..U+FE4FCJK Compatibility Forms
U+FF00..U+FFEFHalfwidth and Fullwidth Forms

In addition, the 303 kanji in the range U+20000..U+2A6FF (CJK Unified Ideographs Extension B) from Jiskan16 are added to the Unifont Plane 0 glyphs. The resulting unifont_jp font file provides complete coverage of the JIS X 0213 standard in a Unicode font; all other JIS code points lie within the Basic Multilingual Plane, which Unifont already covers completely.

I chose the Jiskan 16 by 16 pixel glyphs because they were the clearest free set of JIS glyphs that I could find. However, some of the glyphs do seem a little wide. If anyone who is a Japanese native wishes to improve any of the glyphs from the Jiskan font, the 94 cell glyph PNG files are linked in the JIS X 0213 Plane 1 and Plane 2 tables n the "ISO-2022 and Japanese Font Encoding" section above on this web page. Those graphics images were created with the unihex2png utility, which is part of the GNU Unifont distribution. Instructions for modifying these 16-by-16 pixel glyphs appear on the Unifont Utilities section of this website.

The Jiskan 16 Plane 1 and Plane 2 glyphs are in the two Unifont-format hex files below, with code points ranging from 0x2121 through 0x7E7E as per ISO-2022 encoding. The original .bdf.gz font files are also below (copied from the Gentoo GNU/Linux distribution); they contain the list of contributors to those files. These four files are in the Public Domain:

Izumi 16

Unifont 12.1.03 switched to Izumi 16 glyphs. Following the release of Unifont 12.1.02, someone notified me of a public domain JIS X 0213 font at http://izumilib.web.fc2.com/izumi-bf/dl-eng.html. This font seems to have started as the Jiskan 16 font and improved some glyphs.

The two relevant font files, izmg16-2004-1.bdf and izmg16-2004-2.bdf, contain blank glyphs that are at duplicate locations of non-blank glyphs: 45 duplicates for the first font and 13 for the second. I suspect that this was the result of the original glyphs being encoded in FONTX2 format, with a blank glyph specified at hexadecimal code points xx7F (to avoid ending one block and beginning another in the FONTX2 font when spanning xx7F), which were then converted into code points xx60 in the BDF font. After over a month without a response from the person who maintains that website, I decided to remove the duplicates and redistribute those two font files on this website. They are here:

Izumi 16 Plane 1 Glyphs — 45 Duplicates Removed: 2160, 2560, 2760, 2960, 2B60, 2F60, 3160, 3360, 3560, 3760, 3960, 3B60, 3D60, 3F60, 4160, 4360, 4560, 4760, 4960, 4B60, 4D60, 4F60, 5160, 5360, 5560, 5760, 5960, 5B60, 5D60, 5F60, 6160, 6360, 6560, 6760, 6960, 6B60, 6D60, 6F60, 7160, 7360, 7560, 7760, 7960, 7B60, 7D60.

Izumi 16 Plane 2 Glyphs — 13 Duplicates Removed: 2160, 2360, 2560, 2D60, 2F60, 6F60, 7160, 7360, 7560, 7760, 7960, 7B60, 7D60.

JIS/EUC-JP Conversion Mapping

The following C language header file contains arrays that programmers can use to map from JIS or EUC-JP encoded documents to Unicode. It is the mapping used to convert the Jiskan16 font files to Unicode code points for the Japanese version of Unifont, "unifont_jp". This header file is in the Public Domain:

eucjp2uni.h

File Checksums

Finally, here are SHA256 checksums for the above files:

      895f26a841ba28f7277c5629fec6841a65deb1cbd49487bf0407446d30390fdb  eucjp2uni.h
      005345196615e692c54ef67286b44dd0eaa3a899d066cd407a4b3a810822c20c  izmg16-2004-1.bdf.gz
      987aeec6d39e0dd82d3c41749578589e3a4ddb65c74ae52948030f6092c8c956  izmg16-2004-2.bdf.gz
      cb177cddeb3cf73bd3e62dc93a7db1ad8c162256a50b93ecf7fc3bd641ea26d1  jiskan16-1.hex
      fdf0eb6ad56a5611c03071c8395eeba7ca54c057f4078bf0aac7b9ea05336074  jiskan16-2.hex
      34e05cb59ae562d53b4b8fa5620454e9a060208f76da915c7da844f581e94830  jiskan16-2000-1.bdf.gz
      14d5db8001d4f289df98ce1371f782e81add21281bf4b7f8cd7041cb1c6306cd  jiskan16-2000-2.bdf.gz
      d79a57b81d3626414cb207eafec8d7b94fc2e7efabb5dce9d3ae58416e82ea3d  jiskan16-2004-1.bdf.gz
    
☞ Enjoy! ☜

Sources

JIS Standards:

Lunde, Ken. CJKV Information Procecessing. O'Reilly, 1999. ISBN 1-56592-224-7. By far the most informative source of information on CJKV processing, although dated: it was published before JIS X 0213 was released.

Wikipedia: articles on JIS and EUC-JP encodings.

JIS X 0213 Code Mapping Tables at http://x0213.org/codetable/index.en.html. These were taken as the canonical Japanese encoding, using fullwidth alternates when they were specified, and using Windows code page mappings except for three Plane 1 mappings that would have been duplicates of other code point mappings; the mappings adopted for Unifont were 0x2142 → U+2016, 0x215D → U+2212, and 0x2141 → U+301C.

The GNU libiconv library at https://ftp.gnu.org/gnu/libiconv/, for comparison with data tables at x0213.org.

Gentoo GNU/Linux distribution at https://gentoo.org/, the source for the public domain Jiskan 16 × 16 Plane 1 and Plane 2 BDF fonts.

FONTX2 Format:

https://ja.wikipedia.org/wiki/DOS/V [in Japanese]: History of DOS/V and Takashi Oyama's development of the FONTX2 font format.

http://www.hmsoft.co.jp/lepton/software/dosv/fontx.htm [in Japanese]: Takashi Oyama's description of the FONTX2 format.

http://elm-chan.org/docs/dosv/fontx_e.html [in English]: Description of FONTX2 format.

Valid HTML 4.01 Transitional Valid CSS! Best Viewed with Any Browser