Initially I just posted my additions to Roman Czyborra's original unifont.hex file. Then in mid-January 2008, his website went down. So I started posting font updates here. Roman has encouraged me to continue with my additions.
Roman's website is now back online, and you can read his Unifont description and motivation for its creation on his website, along with his archive of Unifont's changes: http://czyborra.com/unifont.
Earlier GNU Project Releases
Unifont 8.0
-
27 June 2015 Release (Unifont 8.0.01):
This is the first release for Unicode 8.0, whose code charts
The Unicode Consortium formally released on 17 June 2015.
- Anonymous1 drew the new CJK ideographs (U+9FCD..U+9FD5).
- Paul Hardy drew the other new Plane 0 glyphs.
- Paul Hardy added these Plane 1 scripts or drew new Unicode 8.0
glyphs in existing scripts:
- U+010600..U+01077F Linear A
- U+010880..U+0108AF Palmyrene
- U+010880..U+0108AF Nabataean
- U+0108E0..U+0108FF Hatran
- U+0109A0..U+0109FF Meroitic Cursive
- U+010AC0..U+0A0AFF Manichaean
- U+010C80..U+010CFF Old Hungarian
- U+011150..U+01117F Mahajani
- U+011180..U+0111DF Sharada
- U+0111E0..U+0111FF Sinhala Archaic Numbers
- U+011200..U+01124F Khojki
- U+011580..U+0115FF Siddham (prior glyphs by Anonymous1)
- U+01D100..U+01D1FF Musical Symbols (prior glyphs by Andrew Miller)
- U+01D400..U+01D7FF Mathematical Alphanumeric Symbols (prior glyphs by Andrew Miller and Anonymous1)
- U+01F300..U+01F5FF Miscellaneous Symbols and Pictographs (prior glyphs by Andrew Miller, Anonymous1, Nils Moskopp, and Paul Hardy)
- U+01F600..U+01F64F Emoticons (prior glyphs by Paul Hardy and Nils Moskopp
- U+01F680..U+01F6FF Transport and Map Symbols (prior glyphs by Paul Hardy and Nils Moskopp
Unifont 7.0
-
23 October 2014 Release (Unifont 7.0.06):
This is the final version targeted for the Debian Jessie freeze date
of 5 November 2014. There will probably be no new Unifont
releases until Debian Jessie is formally released in a few months.
Paul Hardy added these scripts:
- U+010350..U+01037F Old Permic
- U+01F650..U+01F67F Ornamental Dingbats
- U+01F780..U+01F7FF Geometric Shapes Extended
- U+01F800..U+01F8FF Supplemental Arrows-C
- 17 October 2014 Release (Unifont 7.0.05): This release makes a few changes to the ASCII lower-case letters 'l', 'k', and 't'. Umbreon126 discovered that 'k' was one pixel shorter than other letters with ascenders ('b', 'd', 'h', etc.). Paul Hardy examined the other ASCII letters and found that 'l' was also one pixel shorter. Both of those glyphs were increased in height by one pixel. The letter 't' also had its horizontal stroke moved down one pixel row so that it would be at the same height as the horizontal stroke in 'f'. These changes were made in all Latin scripts in the Basic Multilingual Plane. Still to do is making similar adjustments in the Plane 1 block Mathematical Alphanumeric Symbols. That block will be updated once it is complete (which it almost is, just needing to finish Fraktur glyphs). Nils Dagsson Moskopp and Paul Hardy made further changes and additions; details are in the ChangeLog file in the source tarball.
-
11 October 2014 Release (Unifont 7.0.04):
This is an interim release in preparation for the Debian GNU/Linux freeze.
Another release is planned within two weeks. This release adds the following
scripts, in order of appearance:
- Elbasan (Artur Quaglio)
- Caucasian Albanian (Artur Quaglio)
- Sora Sompeng (Paul Hardy)
- Chakma (Paul Hardy)
- Sharada (Paul Hardy)
- Siddham (Umbreon126)
- Takri (Paul Hardy)
- Miao (Paul Hardy)
- Arabic Mathematical Alphabetic Supplement (Paul Hardy)
- Transport and Map Symbols (Paul Hardy, Nils Dagsson Moskopp)
-
01 July 2014 Release (Unifont 7.0.03):
- This release was made to add a missing man page for the unihexfill utility to the source files.
- Two glyphs were modified: "y" was reverted back to its original form, and the German double s was modified to have a larger curve at the top. Joshua Krämer drew the German double s final form.
- See the ChangeLog file in the source tarball for details.
-
29 June 2014 Release (Unifont 7.0.02):
- Modified a few glyphs in the Basic Multilingual Plane; see the ChangeLog file in the source tarball for full details.
- Nils Moskopp added many more glyphs in the Miscellaneous Symbols and Pictographs block.
- Umbreon126 contributed Pahawh Hmong, redrew Mahjong glyphs to fit better within the Mahjong tile space and look more like brush stroke ideographs, and improved several other existing glyphs.
- Paul Hardy added these scripts to the Supplemental Multilingual Plane,
mainly to complete scripts introduced in Unicode 5.2:
- Meroitic Hieroglyphs
- Meroitic Cursive
- Avestan
- Inscriptional Parthian
- Inscriptional Pahlavi
- Psalter Pahlavi
- Old Turkic
- Rumi Numeral Symbols
- Brahmi
- Kaithi
- Kana Supplement
- Shorthand Format Controls
- Emoticons
- Alchemical Symbols
- There are still approximately 1000 glyphs missing from the SMP from before Unicode 7.0, and approximately 2000 glyphs missing from the SMP that were introduced in Unicode 7.0. If you would like to contribute, feel free to pick one of these missing scripts and draw it! Send a note first to make sure nobody else is working on the script. Note that Cuneiform, Cuneiform Numbers and Punctuation, Egyptian Hieroglyphs, and Bamum Supplement will not be drawn on a 16 by 16 pixel grid and so are not counted in those missing glyph figures. There are plans to draw those scripts on a 32 by 32 pixel grid in the future.
-
21 June 2014 Release (Unifont 7.0.01):
-
Complete coverage of the Unicode 7.0 Basic Multilingual Plane
- Paul Hardy drew the new Unicode 7.0 Plane 0 glyphs. He also redrew U+0D00, U+2702, and the Halfwidth Katakana variants block (U+FF65..U+FF9F), by request because the original Halfwidth Katakana glyphs were too tall. Thanks again to Yuko for checking this work. Incidentally, "yuko" means "freedom" in Japanese, which is very apropos for this work!
- Nils Moskopp contributed redrawn symbols for the "Aldus leaf" (U+2619), as well as U+2622 and U+26DF.
-
Growing coverage of the Unicode 7.0 Supplemental Multilingual Plane
- Andrew Miller drew the new glyphs for Playing Cards, Enclosed Alphanumeric Supplement, and Mathematical Alphanumeric Supplement.
- Nils Moskopp drew the glyphs added in Miscellaneous Symbols and Pictographs.
- Paul Hardy drew the glyphs for Phaistos Disc, Coptic Epact Numbers, Imperial Aramaic, Lydian, Old South Arabian, Old North Arabian, and Enclosed Ideographic Supplement (the last mostly modeled after Wen Quan Yi glyphs).
-
Complete coverage of the Unicode 7.0 Basic Multilingual Plane
Unifont 6.3
-
14 February 2014 Release:
- Added GNU Free Documentation License to COPYING file.
-
04 February 2014 Release:
- Added font properties back into TrueType fonts, but set foundry name to "Unifont" so the font would show properly in Libre Office.
- Added new fonts with support for Michael Everson's ConScript Unicode Registry (CSUR) and for glyphs above the Unicode Basic Multilingual Plane.
- Redrew various APL glyphs. This work is ongoing.
-
21 December 2013 Release:
- Temporarily withdrew font field properties from TrueType fonts because LibreOffice on GNU/Linux was using the foundry name as the entire font name.
- Redrew various APL glyphs. This work is ongoing.
-
17 December 2013 Release:
- Removed FontForge insertion of version information in the PCF font, because the resulting PCF font wasn't handled properly by all applications.
-
15 December 2013 Release:
- Added additional information to BDF fonts: X Logical Font Description (XLFD) fields added to the generated BDF files, to allow grub-mkfont to convert the BDF font into a PF2 font; BDF fonts now have different names for different fonts; BDF fonts now contain copyright information.
- TrueType fonts now perform greater simplification of vectors, resulting in smaller fonts and more efficient rendering.
- Redrew glyphs in the Armenian script, CJK Radical Supplement, and redrew the Capricorn symbol (U+2651).
-
10 October 2013 Release:
- Re-aligned most single-line arrows in the various arrow drawing blocks.
- Adjusted the hyphen to be 4 pixels wide, en dash to be 6 pixels wide, and em dash to be 8 pixels wide.
- Centered some C1 Control glyphs that weren't previously centered.
- Swapped the glyphs for U+FE17 and U+FE18 from the original Unifont 6.3 release.
There are some other less notable changes. See the README file in the source distribution for a detailed description of the glyph changes.
-
30 September 2013: The original version of Unifont 6.3 was released,
the same day that The Unicode Consortium formally released Unicode Version 6.3.0.
This release provides a glyph for every printable character (code point) in the
Unicode Basic Multilingual Plane.
- Andrew Miller drew the glyphs added in Unicode 6.3.0.
- Errata for the Unicode 1.0 release, published only in printed form in The Unicode Standard version 1.1, has been checked. No new corrections to Unifont were necessary. The errata for Unicode 6.3.0, published by The Unicode Consortium on 30 September 2013, did not require any new changes to Unifont. Unifont now has been checked against all errata published by The Unicode Consortium from the very first errata published to the latest, with all corrections made as necessary.
- Because FontForge users found the combining circles useful as a reference, a new program was added to the sources to superimpose dashed combining circles on combining characters. The special BDF reference version of Unifont in the BDF font link above is the result.
- The Unifont 6.3 source release includes a program to generate a giant bitmapped graphics (.bmp) image of the entire font. You can see the image converted to a PNG file at the top of this page. This was something I had wanted to do for years but was using my free time for glyph drawing. What originally got me interested in this font was seeing this picture while searching for free Unicode fonts. I could not get permission to include the program that generated that picture into the Unifont source package, so finally for this release I wrote my own.
- While adding combining circles back into a unifont.hex file to create the image of the entire font, I noticed that some Syriac combining characters had been encoded as being single-width. Now all glyphs in the Syriac script are double-width.
- As I mentioned at the time of the Unifont 5.1 release, I've changed the Block Elements (U+2580..U+259F) widths. They are now all single-width. Unlike the Box Drawing glyphs (U+2500..U+257F), the Block Elements do not have a one-to-one correspondence with any DOS or Microsoft Windows code page. They are a superset of Microsoft's old block characters, and the full Unicode Block Element glyphs are very clearly shown as exact squares. This change will break rendering for anyone who has used those glyphs expecting, for example, the full block to be rendered as a perfect square as depicted in The Uniocde Standard. On the other hand, it will allow compatibility with the (now ancient) DOS character set. You can get the original set of double-width Block Elements here.
Unifont 6.2
This was an interim release in preparation for the Unifont 6.3 release. Changes introduced included:
- Unifont 6.2 was released on 1 September 2013 and provided complete coverage of the Unicode 6.2 BMP. Because of problems with being able to release previous Hangul Syllables blocks with a GNU General Public License (which I was not aware of at the time of the Unifont 5.1 release), I created a new Hangul Syllables block — 11,172 glyphs in all — from scratch for this release. This took a lot of coaching over the past few years from Koreans who were from Korea, educating me on the finer points of drawing Hangul syllables. You can read the details in the pages on Hangul on this site. Anyone who has better mastery over Hangul than I do is welcome to improve upon the result. Thanks to Jeong-Mi (정미) for her help with this over the course of a couple of years. In June 2013, she was satisfied with the result. At that point I made those glyphs public. Then after drawing the remaining Unicode 6.2 glyphs that I hadn't yet drawn, the Unifont 6.2 release was made public. The completion of the Hangul Syllables block was the impetus behind the Unifont 6.2 release. For the first time, it became possible to release Unifont with one overall GNU license.
- As of the Unifont 6.2 release, there is just one final set of glyphs, not four. The one font file has no combining circle marks, and no longer uses the original CJK ideographs that Roman Czyborra had in the original Unifont. This simplification was done with Roman's approval. This one font variant is still available in TrueType, PCF, and BDF formats. This was part of a major restructuring I did of the sources over the past five years, though I waited for a new release until having the Hangul Syllables complete to avoid any further licensing issues.
-
As of the Unifont 6.2 source release, the blanks.hex file from the Unifont 5.1
release has been renamed to unassigned.hex, as part of the major restructuring
I had been doing with the source over the past five years.
As an alternative to Unicode's generic substitution character,
The Unicode Standard 5.0, p. 155 suggests rendering unassigned code points
either as a shaded box to convey that they are unassigned (hence the Unicode Code
Charts representing unassigned code points as shaded boxes, which is the convention
that I adopted earlier), or as a glyph containing the four-digit hexadecimal code
point for the Basic Multilingual Plane, or a six-digit hexadecimal code point for
higher planes. This recommendation has remained in Section 5.3 of
The Unicode Standard to this day. For example, see Section 5.3 in
The Unicode Standard Version 6.2 — Core Specification.
Here's a sample hex string of the previous gray box glyph, for historical purposes:
0378:00542A542A542A542A542A542A542A00
- When I first started filling in missing glyphs in Unifont in 2007, I realized that I needed a way to tell overall how many glyphs still remained to be drawn. There were about 8,500 missing glyphs scattered throughout the Basic Multilingual Plane plus about 10,000 glyphs missing in the Han Ideographic section. I couldn't quickly distinguish which areas had undrawn glyphs versus code points that were simply unassigned. I used the recommendation of The Unicode Standard to render unassigned glyphs as a gray box, contained in the blanks.hex file, now renamed to unassigned.hex. That let me easily track how many glyphs still needed to be drawn in each block of 256 code points. After that, I wrote a program (unihexgen) to generate four- or six-digit hexadecimal numbers in a single glyph. That program has created the new glyphs in unassigned.hex and pua.hex.
-
The Private Use Area glyph file, pua.hex, is no longer built into
the final font. That file only appears in the sources. Previously it
contained a pencil icon, suggested as a possible rendering in The Unicode
Standard 5.0, p. 155 as a refinement over the generic Unicode substitution
character. Now pua.hex contains four-digit hexadecimal code points
appearing as white digits on a black background (though by default these are
not built into the final font). This was also a suggestion
of a possible rendering in The Unicode Standard 5.0, p. 155 and is still
a recommendation in Section 5.3 of The Unicode Standard today. For example,
see Section 5.3 in
The Unicode Standard Version 6.2 — Core Specification.
Here's a sample hex string of the previous pencil glyph, for historical purposes:
E000:FFB9C5EDD5D5D5D5D5D5D5D5EDB991FF
- The default substitution code point is now Unicode's default Replacement Character, U+FFFD. This is rendered as a white question mark against a black background: �. All previous versions of Unifont followed the BDF font convention of using the space character as the substitution character. This was perfectly permissible under The Unicode Standard, because use of U+FFFD is optional. The Unicode Standard also allows a font to "simply display nothing" in such a case; see The Unicode Standard 5.0, p. 155.
- The font has incorporated all past errata listed by The Unicode Consortium on their website, with one exception. The exception is the Ogham space glyph is still rendered as a horizontal stroke to allow proper rendering within Unifont's bitmapped constraints. It would take a more complicated font than Unifont to allow alternate renderings. See the README file in the full distribution for details.
- Speaking of errata, Andrew Miller found that the glyph for U+2047 was not drawn correctly as a double question mark. That has been fixed.
- Some new utilities have been added to the source package. Check out the Unicode Utilities page for details.
AnteGNUvian History
The remainder of this page contains version history before all glyphs were able to be licensed under the GNU General Public License. The big stumbling block was the Hangul Syllables section of 11,172 glyphs. Their license allowed free copying and use, but was non-GPL and they could not be licensed under the GPL.
Unifont 5.1 Release
This section is preserved for now for historical reasons. It might be removed in a future update. The current Unifont release includes changes that were made over a period of several years from the 5.1 release. During that time, I made significant changes to the utilities package that accompanies this font, and no longer build a version with combining circles by default. Thus everything released after the 5.1 release is organizationally different.
Here are the font files from the previous Unifont version 5.1 release.
Here's a Winzip archive for Mac and Windows (unzip the file and copy to your Fonts folder):
Here is a PCF version of the font for Unices:
Finally, here is a BDF version of the font for Unices:
There are four versions of the font. The first is the one you'll most likely want for a bitmap display as it has combining circle marks removed. The TrueType and BDF font files were created from this first version. The versions with "full" in the name contain combining circle marks, pictures for noncharacters, etc. The versions with "-jp" in the name preserve Roman Czyborra's original CJK glyphs, which he obtained from a public domain Japanese font. The .hex files are about 3 Mbytes each.
- http://unifoundry.com/unifont-5.1.20080820.hex
- http://unifoundry.com/unifontfull-5.1.20080820.hex
- http://unifoundry.com/unifont-jp-5.1.20080820.hex
- http://unifoundry.com/unifontfull-jp-5.1.20080820.hex
If you'd like to contribute any glyphs, please send them to unifoundry at this domain name (not spelled out because of spammers). You can download bitmaps for the ranges you'd like to edit (see the hyperlinks in the HTML table below). Then send me your modified .bmp file, or convert the .bmp file to a .hex file with my unibmp2hex conversion utility.
To see how a glyph should appear, consult the Unicode code pages at http://www.unicode.org/charts/.
Wen Quan Yi: Spring of Letters (文泉驛 / 文泉驿)
The biggest improvement to GNU Unifont was the addition of over 20,000 new CJK glyphs from version 1.1 of the Unibit font by Qianqian Fang (房骞骞). The Unibit font began as a combination of the original GNU Unifont and a basic CJK bitmap font placed in the public domain by the People's Republic of China. It adopted GNU Unifont's scheme of 8x16 and 16x16 glyphs. Qianqian Fang and many others then added about 10,000 more glyphs.
Qianqian states in the Unibit distribution: "The entire CJK Unified Ideographics (U4E00-U9FA5) and CJK Unified Ideographics Extension A(U3400-U4DB5) blocks were replaced by high-quality glyphs from China National Standard GB19966-2005 (public domain)." Qianqian also drew the new 22 CJK ideographs in the range U+9FA6..U+9FBB that appear in this version of the Unifont.
The Wen Quan Yi Unibit font is released under version 2.0 of the GNU General Public license, with the exception that embedding the Unibit font in a document does not in itself bind the document to the conditions of the GNU GPL. See his website for more details: http://wqy.sourceforge.net/cgi-bin/enindex.cgi.
Wen Quan Yi (WQY) means "spring of letters," as in a spring of water. This is an interesting choice of words, as the British spelling of "font" is "fount" (but still pronounced "font").
The following code points in the latest unifont.hex file are taken from the WQY Unibit font (with my additions to complete the U+3000..U+33FF range, particularly the missing Hiragana, Katakana, and Kanji):
- U+2E80..U+2EFF: CJK Radicals Supplement
- U+2F00..U+2FDF: Kangxi Radicals
- U+2FF0..U+2FFF: Ideographic Description Characters
- U+3000..U+303F: CJK Symbols and Punctuation
- U+31C0..U+31EF: CJK Strokes
- U+3200..U+32FF: Enclosed CJK Letters and Months
- U+3300..U+33FF: CJK Compatibility
- U+3400..U+4DBF: CJK Unified Ideographs Extension A
- U+4E00..U+9FBF: CJK Unified Ideographs
- U+F900..U+FAFF: CJK Compatibility Ideographs
- U+FF00..U+FF60: Fullwidth Forms of Roman Letters
Qianqian has given his okay to add these CJK glyphs into GNU Unifont. Likewise, I've told him to incorporate any glyphs he wants from my contributions to GNU Unifont into his Unibit font.
Additions for 2008-09-07: TrueType Combining Diacritical Marks
The .hex, .bdf, and .pcf files are unchanged from the 2008-08-20 version. The .ttf (TrueType) font was modified to superimpose Unicode combining diacritical marks on the glyph that they follow, with no advance in cursor position.
Additions for 2008-08-20: Hangul Syllables
This release replaces the Hangul Syllables block with a light-stroke set of Hangul glyphs created by Changwoo Ryu from the free Baekmuk fonts.
Additions for 2008-08-08: CJK Imrpovements
This release improves the vertical positioning of Unified Han ideographs, and includes an overhaul of the Halfwidth and Fullwidth Forms for CJK use.
Additions for 2008-07-06: Braille Block Corrected
In 1998, Roman Czyborra wrote a Perl script to generate the Braille glyphs. He used it to create the U+2800..U+28FF range of GNU Unifont. This script did not enumerate the Braille dot sequences correctly. He posted a corrected version of the script on his website in 2003, but the original unifont.hex file kept the incorrect Braille glyphs. I just learned of this while going through all of his Perl scripts. This latest version corrects that error.
Additions for 2008-06-20: Unicode 5.1 Complete BMP
A journey of a thousand glyphs begins with the first pixel, to paraphrase Lao Tzu. In this case, the journey began with some missing Latin glyphs and in the end added approximately 18,500 glyphs to the existing GNU Unifont for complete coverage of the Unicode 5.1 Basic Multilingual Plane.
This is the culmination of a 10 year effort, originally begun by Roman Czyborra in 1998 (with earlier work dating to 1994). David Starner was propagating Roman's Perl scripts and adding users' glyph contributions for a while, but left the effort several years ago. Rich Felker created a new set of Tibetan glyphs during a point when GNU Unifont was not being maintained by anyone; I've now incorporated his improvements. Qianqian Fang began his Wen Quan Yi effort in 2004 with a focus on providing top-notch CJK ideographs in both bitmap and vector forms.
I am currently maintaining this font, and am glad to add any improvements that anyone wants to send (within as well as beyond the Basic Multilingual Plane). I'm going to experiment with breaking the 16-bit barrier in the future. Before doing that, I'd like to experiment more to see how some software (such as TeX) deals with more than 65,536 code points in a font.
Complex scripts (such as Arabic and the Brahmi-derived Indic scripts) exceed the limitations of the BDF font format. Yet Roman's philosophy was that it would be better to show something versus a dreaded empty box on the screen for any character in Unicode's Basic Multilingual Plane. Pursuing that philosophy, I've drawn over 8,000 glyphs even for complex scripts so every printable glyph would have some rendering.
Qianqian Fang and his Wen Quan Yi volunteers added the Unicode 5.1 Bopomofo (U+3100..U+312F), CJK Strokes (U+31C0..U+31EF), and CJK ideographs (U+9FBC..U+9FC3) for complete Unicode 5.1 coverage. Qianqian also drew examples of the clock and moon ideographs as a model for me to complete the U+3200..U+33FF range. I added the missing Hiragana, Katakana, Kanji, and Hangul. I also improved some existing Hiragana and Katakana in that range with expert oversight from Yuko (ゆうこ / 由子) — thanks Yuko!
I changed the entire Hangul Syllables block of 11,172 glyphs (U+AC00..U+D7A3) to thin stroke glyphs. Roman had mentioned on his website wanting to do this someday. Jungshik Shin (신정식) had written a Perl script in 1998 to convert a Hanterm font to the Unicode Hangul Syllables in .hex format. I made a couple of bug fixes to this script and used it to create BDF files covering the Hangul Syllables range for all four Hanterm Hangul fonts. See http://unifoundry.com/hangul/index.html for more details.
I adjusted the horizontal centering of all Yi Syllables and Yi Radicals. The Yi glyphs were for the most part moved left by one pixel so that they center on the eighth pixel across rather than the ninth pixel across.
Yi was traditionally written vertically from top to bottom or horizontally from right to left, according to the Unicode 5.0 Standard, p. 440. Therefore I made the center line of the glyphs the ninth pixel column, counting from the left rather than the eighth pixel. Today Yi is also written horizontally from left to right.
I attempted to create a set of Yi glyphs that would look good if written either horizontally (in either direction) or vertically. Some horizontal strokes are a little exaggerated as a result, to avoid excessive white space to the left and right of a glyph with horizontal writing. Some circles are also flattened a little for the same reason.
All Yi glyphs are 16 pixels wide. Most Yi glyphs are two pixels above the baseline. Some reach below this as a matter of necessity, to have the entire glyph fit within a 16x16 grid.
If a Yi font were only to be written horizontally, a combination of 12 and 16 pixels wide would probably allow ideal rendering of Yi. Currently the Unifont only supports 8 and 16 pixel glyph widths.
Additions for 2008-03-09
U+1100..U+11FF (Hangul Jamo): I replaced all existing glyphs in this range (there were just a fraction of the total needed) with thin stroke glyphs. Roman had expressed a desire to do this someday on his website. A 16x16 grid wasn't quite enough to draw one set of glyphs that could combine initial consonant + vowel + final consonant for all combinations. A cell height of 24 pixels would probably be okay.
GNU Unifont Unicode 5.1 Basic Multilingual Plane Coverage
Here's a summary of the percent completion of each range in the Basic Multilingual Plane:
Covered Range Script ------- ----- ------ 100.0% U+0000..U+007F C0 Controls and Basic Latin 100.0% U+0080..U+00FF C1 Controls and Latin-1 Supplement 100.0% U+0100..U+017F Latin Extended-A 100.0% U+0180..U+024F Latin Extended-B 100.0% U+0250..U+02AF IPA Extensions 100.0% U+02B0..U+02FF Spacing Modifier Letters 100.0% U+0300..U+036F Combining Diacritical Marks 100.0% U+0370..U+03FF Greek and Coptic 100.0% U+0400..U+04FF Cyrillic 100.0% U+0500..U+052F Cyrillic Supplement 100.0% U+0530..U+058F Armenian 100.0% U+0590..U+05FF Hebrew 100.0% U+0600..U+06FF Arabic 100.0% U+0700..U+074F Syriac 100.0% U+0750..U+077F Arabic Supplement 100.0% U+0780..U+07BF Thaana 100.0% U+07C0..U+07FF N'Ko 100.0% U+0800..U+08FF Unassigned 100.0% U+0900..U+097F Devanagari 100.0% U+0980..U+09FF Bengali 100.0% U+0A00..U+0A7F Gurmukhi 100.0% U+0A80..U+0AFF Gujarati 100.0% U+0B00..U+0B7F Oriya 100.0% U+0B80..U+0BFF Tamil 100.0% U+0C00..U+0C7F Telugu 100.0% U+0C80..U+0CFF Kannada 100.0% U+0D00..U+0D7F Malayalam 100.0% U+0D80..U+0DFF Sinhala 100.0% U+0E00..U+0E7F Thai 100.0% U+0E80..U+0EFF Lao 100.0% U+0F00..U+0FFF Tibetan 100.0% U+1000..U+109F Myanmar 100.0% U+10A0..U+10FF Georgian 100.0% U+1100..U+11FF Hangul Jamo 100.0% U+1200..U+137F Ethiopic 100.0% U+1380..U+139F Ethiopic Supplement 100.0% U+13A0..U+13FF Cherokee 100.0% U+1400..U+167F Unified Canadian Aboriginal Syllabics 100.0% U+1680..U+169F Ogham 100.0% U+16A0..U+16FF Runic 100.0% U+1700..U+171F Tagalog 100.0% U+1720..U+173F Hanunoo 100.0% U+1740..U+175F Buhid 100.0% U+1760..U+177F Tagbanwa 100.0% U+1780..U+17FF Khmer 100.0% U+1800..U+18AF Mongolian 100.0% U+18B0..U+18FF Unassigned 100.0% U+1900..U+194F Limbu 100.0% U+1950..U+197F Tai Le 100.0% U+1980..U+19DF New Tai Lue 100.0% U+19E0..U+19FF Khmer Symbols 100.0% U+1A00..U+1A1F Buginese 100.0% U+1A20..U+1AFF Unassigned 100.0% U+1B00..U+1B7F Balinese 100.0% U+1B80..U+1BBF Sundanese 100.0% U+1BC0..U+1BFF Unassigned 100.0% U+1C00..U+1C4F Lepcha 100.0% U+1C50..U+1C7F Ol Chiki 100.0% U+1C80..U+1CFF Unassigned 100.0% U+1D00..U+1D7F Phonetic Extensions 100.0% U+1D80..U+1DBF Phonetic Extensions Supplement 100.0% U+1DC0..U+1DFF Combining Diacritical Marks Supplement 100.0% U+1E00..U+1EFF Latin Extended Additional 100.0% U+1F00..U+1FFF Greek Extended 100.0% U+2000..U+206F General Punctuation 100.0% U+2070..U+209F Superscripts and Subscripts 100.0% U+20A0..U+20CF Currency Symbols 100.0% U+20D0..U+20FF Combining Diacritical Marks for Symbols 100.0% U+2100..U+214F Letterlike Symbols 100.0% U+2150..U+218F Number Forms 100.0% U+2190..U+21FF Arrows 100.0% U+2200..U+22FF Mathematical Operators 100.0% U+2300..U+23FF Miscellaneous Technical 100.0% U+2400..U+243F Control Pictures 100.0% U+2440..U+245F Optical Character Recognition 100.0% U+2460..U+24FF Enclosed Alphanumerics 100.0% U+2500..U+257F Box Drawing 100.0% U+2580..U+259F Block Elements 100.0% U+25A0..U+25FF Geometric Shapes 100.0% U+2600..U+26FF Miscellaneous Symbols 100.0% U+2700..U+27BF Dingbats 100.0% U+27C0..U+27EF Miscellaneous Mathematical Symbols - A 100.0% U+27F0..U+27FF Supplemental Arrows - A 100.0% U+2800..U+28FF Braille Patterns 100.0% U+2900..U+297F Supplemental Arrows - B 100.0% U+2980..U+29FF Miscellaneous Mathematical Symbols - B 100.0% U+2A00..U+2AFF Supplemental Mathematical Operators 100.0% U+2B00..U+2BFF Miscellaneous Symbols and Arrows 100.0% U+2C00..U+2C5F Glagolitic 100.0% U+2C60..U+2C7F Latin Extended - C 100.0% U+2C80..U+2CFF Coptic 100.0% U+2D00..U+2D2F Georgian Supplement 100.0% U+2D30..U+2D7F Tifinagh 100.0% U+2D80..U+2DDF Ethiopic Extended 100.0% U+2DE0..U+2DFF Cyrillic Extended - A 100.0% U+2E00..U+2E7F Supplemental Punctuation 100.0% U+2E80..U+2EFF CJK Radicals Supplement 100.0% U+2F00..U+2FDF Kangxi Radicals 100.0% U+2FE0..U+2FEF Unassigned 100.0% U+2FF0..U+2FFF Ideographic Description Characters 100.0% U+3000..U+303F CJK Symbols and Punctuation 100.0% U+3040..U+309F Hiragana 100.0% U+30A0..U+30FF Katakana 100.0% U+3100..U+312F Bopomofo 100.0% U+3130..U+318F Hangul Compatibility Jamo 100.0% U+3190..U+319F Kanbun 100.0% U+31A0..U+31BF Bopomofo Extended 100.0% U+31C0..U+31EF CJK Strokes 100.0% U+31F0..U+31FF Katakana Phonetic Extensions 100.0% U+3200..U+32FF Enclosed CJK Letters and Months 100.0% U+3300..U+33FF CJK Compatibility 100.0% U+3400..U+4DBF CJK Unified Ideographs Extension A 100.0% U+4DC0..U+4DFF Yijing Hexagram Symbols 100.0% U+4E00..U+9FCF CJK Unified Ideographs 100.0% U+9FD0..U+9FFF Unassigned 100.0% U+A000..U+A48F Yi Syllables 100.0% U+A490..U+A4CF Yi Radicals 100.0% U+A4D0..U+A4FF Unassigned 100.0% U+A500..U+A63F Vai 100.0% U+A640..U+A69F Cyrillic Extended - B 100.0% U+A6A0..U+A6FF Unassigned 100.0% U+A700..U+A71F Modifier Tone Letters 100.0% U+A720..U+A7FF Latin Extended - D 100.0% U+A800..U+A82F Syloti Nagri 100.0% U+A830..U+A83F Unassigned 100.0% U+A840..U+A87F Phags-pa 100.0% U+A880..U+A8DF Saurashtra 100.0% U+A8E0..U+A8FF Unassigned 100.0% U+A900..U+A92F Kayah Li 100.0% U+A930..U+A95F Rajang 100.0% U+A960..U+A9FF Unassigned 100.0% U+AA00..U+AA5F Cham 100.0% U+AA60..U+ABFF Unassigned 100.0% U+AC00..U+D7AF Hangul Syllables 100.0% U+D7B0..U+D7FF Unassigned 0.0% U+D800..U+DFFF Surrogate Pairs - Not Used 100.0% U+E000..U+F8FF Private Use Area 100.0% U+F900..U+FAFF CJK Compatibility Ideographs 100.0% U+FB00..U+FB4F Alphabetic Presentation Forms 100.0% U+FB50..U+FDFF Arabic Presentation Forms - A 100.0% U+FE00..U+FE0F Variation Selectors 100.0% U+FE10..U+FE1F Vertical Forms 100.0% U+FE20..U+FE2F Combining Half Marks 100.0% U+FE30..U+FE4F CJK Compatibility Forms 100.0% U+FE50..U+FE6F Small Form Variants 100.0% U+FE70..U+FEFF Arabic Presentation Forms - B 100.0% U+FF00..U+FFEF Halfwidth and Fullwidth Forms 100.0% U+FFF0..U+FFFF Specials
Here's a color-coded table of the entire GNU Unifont coverage. The "Reference" version contains the original .hex bitmaps, and includes noncharacters, combining circles, and other renderings removed from the final version. The "Final" version contains glyphs as they would appear in an ordinary font; these are the source of the .bdf and .ttf files.
All usable code point "pages" of 256 code points are green because GNU Unifont now has complete coverage of the Unicode 5.1 Basic Multilingual Plane.
Brahmi-derived (Indic) Scripts
Available Unicode glyph assignments do not suffice for proper rendering of these syllabic scripts in a BDF font. Nevertheless I've added these under the premise that a rendering of something is better than nothing. They might be of use as a starting point for someone who wants to experiment with Microsoft's Volt, or with the SIL's Graphite, or something new.
A big challenge with Indic scripts is that they are abugidas, where a consonant sign is pronounced with an inherent following vowel (usually short "a"). When two consonants are joined, sometimes the first consonant is written in a half-form to denote that the inherent vowel is dropped. Many consonant conjunctions have unique glyphs, yet they do not have unique code points in Unicode. The most common example in modern Hindi is "k" + "ssa." The way this conjunction is written as a single glyph bears little resemblance to the original letters, yet Unicode has no code points for such combinations.
Fonts for these scripts have resorted to placing such essential glyphs in a Private Use Area. In fact, the Unicode 5.1 Standard even proposes that a font store Tamil syllables in a Private Use area. That defeats the whole purpose of Unicode as a standard encoding of the world's scripts, but is necessary for rendering under Unicode's current limitations. It is an artifact of the Unicode consortium adopting the Indian Government's ISCII-1988 standard. In the 1980s, the Indian government was concerned with simple sorting algorithms that did not require much memory.
Unicode 5.0, like ISCII-1988, only contains two of the many accent marks used for reciting of the Sanskrit Vedas. There is a movement to add such accents to the Unicode standard. A good summary of proposed additions to Unicode (with example glyphs for missing accents in the Devanagari script) appears at http://www.omkarananda-ashram.org/Sanskrit/vedicaccents.htm. They are seeking others to review their list of glyphs, and to notify them of equivalent glyphs in scripts besides Devanagari.
Another effort to add Vedic marks to the Unicode Standard is being coordinated by the Sanskrit Library (http://sanskritlibrary.org/) and hosted by Brown University. See http://www.language.brown.edu/Sanskrit/VedicUnicode/ for the latest status. Thanks to Jaldhar Vyas for this update.
Arabic and Other Semitic Scripts
Arabic letters have up to four different shapes each that are a natural development of its cursive written form: one when a letter appears isolated from other letters; one when a letter appears at the beginning of a word; one with a letter appears in the middle of a word; and one when a letter appears at the end of a word.
Unicode only provides one code point for each letter. The font technology must therefore know how to draw each Arabic letter through extensive ligature tables. This is beyond the capability of an ordinary BDF font file, so GNU Unifont does not provide full support for Arabic or other Semitic scripts. The Unifont glyphs show Arabic letters in their isolated forms so that at least something will be rendered.
Now You See It...
Initially, I wanted to find out what glyphs remained to be drawn
to complete GNU Unifont. One problem was that the
unifont.hex
file did not include filler glyphs
for the unassigned code points and noncharacters.
So I created the following file, which mimics the striped lines from the Unicode code blocks for all fillers. Note that new releases of Unicode will assign glyphs to some of these previously unassigned code points. At that time, the corresponding blank entries in this file should be removed.
I am placing these filler glyphs in the public domain. Do what thou wilt.
This file contains filler glyphs for all Unicode BMP code points that do not have assigned glyph representations in the published Unicode 5.1 standard, not counting the surrogate code points (U+D800 through U+DFFF) or the BMP Private Use code points (U+E000 through U+F8FF). My reference was the CD-ROM included in the back of the Unicode 5.0 Standard book, plus updates for Unicode 5.1 from the unicode.org website.
This Unicode 5.1 version has 3658 filler glyphs for the Basic Mulitlingual Plane. The Unicode 5.0 version had 4667 filler glyphs. Unicode 5.1 added over 1000 glyphs to the Basic Multilingual Plane. Each filler glyph looks like this:
The Winzip version of this file is blanks-5.1.zip. The gzipped Unix/Linux version of this file is blanks-5.1.hex.gz.
Additional Glyphs — 31 December 2007
I've added over 100 glyphs that can be incorporated into GNU Unifont to complete the range U+0000 through U+05FF. Code points U+0360, U+0361, and U+0362 had a spurious pixel set in the original file; I made new glyphs with that pixel erased.
These glyphs are available under the GNU General Public License. Many of them were derived by copying and pasting a cell (generated by unihex2bmp) in a graphics editor.
Historical Note: This patch file applied to the original unifont.hex file was available for a brief period on Roman Czyborra's website before it went down. He asked me to continue my additions to GNU Unifont, so for now I'm making updates available as entire font files.