- Home /
How do I convert from unicode to surrogate pairs?
I'm trying to figure out how to convert emoji's unicode to surrogate pairs.
According to this for example: https://apps.timwhitlock.info/unicode/inspect/hex/1F601. I want to get D83D DE01 from 1F601
How can I do this? I assume it has something to do with Encoding.Unicode but I'm not sure how to use it?
Answer by Bunny83 · Apr 14, 2020 at 01:44 AM
I'm not sure you really understand what the surrogate pairs are meant for. They just encode the exact same unicode code point just as two 16 bit pairs. It's explained pretty well on wikipedia. So if you have a unicode code point you first subtract 0x10000. The remaining number is split into two 10 bit numbers, one low and one high pair. You add D800 to the high bits and DC00 to the low bits. In your specific example
1F601
binary representation
0001 1111 0110 0000 0001
When we subtract 0x10000 we get
0F601
or
0000 1111 0110 0000 0001
The high bits are 0000 1111 01, the low bits are 10 0000 0001
To get the high surrogate we just combine D800 (1101 1000 0000 0000) with the high bits
1101 1000 0000 0000 // D800
0000 0000 0011 1101 // 003D
---------------------------
1101 1000 0011 1101 // D83D
To get the low surrogate we do the same thing with DC00
1101 1100 0000 0000 // DC00
0000 0010 0000 0001 // 0201
---------------------------
1101 1110 0000 0001 // DE01
In plain C# it would look something like that:
int cp = 0x1F601;
cp -= 0x10000;
int high = 0xD800 | ((cp >> 10) & 0x3FF);
int low = 0xDC00 | (cp & 0x3FF);
However C# does already have a method that does this conversion char.ConvertFromUtf32.
Note that all this is irrelevant if your used font does not support those code points you're trying to display. Also note that there is no font that supports all unicode code points
Answer by Mysterious-Hercules · Apr 14, 2020 at 06:44 PM
Thank you very much. This is exactly what I needed.
Your answer
Follow this Question
Related Questions
How to display khmer font in unity 0 Answers
WWW.text string not correct? 2 Answers
Problems with Russian characters within strings (C#) 3 Answers
Umlaut shown as ?? in GUILayout.Button 3 Answers
XmlException: Text node cannot appear in this state 2 Answers