Posted by: Neo on: January 1, 2009
Just open this rfc : RFC 3629
Open this table:
Char. number range | UTF-8 octet sequence
(hexadecimal) | (binary)
--------------------+---------------------------------------------
0000 0000-0000 007F | 0xxxxxxx
0000 0080-0000 07FF | 110xxxxx 10xxxxxx
0000 0800-0000 FFFF | 1110xxxx 10xxxxxx 10xxxxxx
0001 0000-0010 FFFF | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
Select a language with chars < U+10FFFF {It is possible to have chars above 10FFFF also if needed with time}– Unicode charts
I just selected U+0950 ( U+0950 )
Now looking at the table it is in 0800-FFFF range
0000 0800-0000 FFFF | 1110xxxx 10xxxxxx 10xxxxxx
So just write ‘u0950′ in binary –>
$ awk ' BEGIN { print "\xE0\xA5\x90" } '
ॐ
Now let us take
U+30F8 ヸ KATAKANA LETTER VI ( U+30F8)
Range :
0000 0800-0000 FFFF | 1110xxxx 10xxxxxx 10xxxxxx
$ awk ' BEGIN { print "\xE3\x83\xB8" } '
ヸ
next
U+10147 𐅇 GREEK ACROPHONIC ATTIC FIFTY THOUSAND (U+10147)
Range:
0001 0000-0010 FFFF | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
$ awk ' BEGIN { print "\xF0\x90\x85\x87" } '
𐅇
You may need to install ‘ttf-ancient-fonts’ to look at GREEK ACROPHONIC.
After doing this enjoy the following article.
No Excuses!
[...] UTF-8 using plain hands [...]
1 | I am no expert but what did you do Joel? eeeeh!!! (No Excuses!) « Nitin Verma’s Blog
January 3, 2009 at 11:50 am
[...] About UTF-8 using plain hands [...]