/media/bill/PROJECTS/System_maintenance/Linux/unicode notes.txt www.BillHowell.ca 25May2018 initial ******************************** 25May2018 unicode - how to tell how many bytes https://stackoverflow.com/questions/5290182/how-many-bytes-does-one-unicode-character-take How many bytes does one Unicode character take? Strangely enough, nobody pointed out how to calculate how many bytes is taking one Unicode char. Here is the rule for UTF-8 encoded strings: Binary Hex Comments 0xxxxxxx 0x00..0x7F Only byte of a 1-byte character encoding 10xxxxxx 0x80..0xBF Continuation bytes (1-3 continuation bytes) 110xxxxx 0xC0..0xDF First byte of a 2-byte character encoding 1110xxxx 0xE0..0xEF First byte of a 3-byte character encoding 11110xxx 0xF0..0xF4 First byte of a 4-byte character encoding So the quick answer is: it takes 1 to 4 bytes, depending on the first one which will indicate how many bytes it'll take up. Update As prewett pointed out, this rule only applies to UTF-8 edited Nov 7 '16 at 6:51 answered Oct 26 '15 at 15:38 paul.ago # enddoc