Nitin Verma’s Blog

Posts Tagged ‘Alfred Aho

Just open this rfc : RFC 3629

Open this table:

   Char. number range  |        UTF-8 octet sequence
      (hexadecimal)    |              (binary)
   --------------------+---------------------------------------------
   0000 0000-0000 007F | 0xxxxxxx
   0000 0080-0000 07FF | 110xxxxx 10xxxxxx
   0000 0800-0000 FFFF | 1110xxxx 10xxxxxx 10xxxxxx
   0001 0000-0010 FFFF | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

Select a language with chars < U+10FFFF {It is possible to have chars above 10FFFF also if needed with time}– Unicode charts

Unicode character search

I just selected U+0950 ( U+0950 )

Now looking at the table it is in 0800-FFFF range

   0000 0800-0000 FFFF | 1110xxxx 10xxxxxx 10xxxxxx 

So just write ‘u0950’ in binary –>

  1. 0000 1001 0101 0000
  2. 0000 100101 010000 << make two group of 6 and one of 4
  3. 1110 0000 1010 0101 1001 0000 << now append group of 4 with 1110 and others with 10
  4. E0 A5 90 << in Hex
$ awk ' BEGIN { print "\xE0\xA5\x90" } '
ॐ

Now let us take
U+30F8 ヸ KATAKANA LETTER VI ( U+30F8)
Range :

   0000 0800-0000 FFFF | 1110xxxx 10xxxxxx 10xxxxxx 
  1. 0011 0000 1111 1000
  2. 0011 000011 111000
  3. 1110 0011 1000 0011 1011 1000
  4. E3 83 B8
$ awk ' BEGIN { print "\xE3\x83\xB8" } '
ヸ

next
U+10147 𐅇 GREEK ACROPHONIC ATTIC FIFTY THOUSAND (U+10147)
Range:

    0001 0000-0010 FFFF | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx 
  1. 0001 0000 0001 0100 0111
  2. 00 010000 000101 000111
  3. 1111 0000 1001 0000 1000 0101 1000 0111
  4. F0 90 85 87
$ awk ' BEGIN { print "\xF0\x90\x85\x87" } '
𐅇

You may need to install ‘ttf-ancient-fonts’ to look at GREEK ACROPHONIC.

After doing this enjoy the following article.
No Excuses!