- Home /
Problems with Russian characters within strings (C#)
I am trying to create an array of strings that will contain Russian characters, this way:
rawKeys = new string[] {
"!",
"А",
"Б",
"В",
"Г",
"Д",
"Е",
"Ё",
"Ж",
"З",
"И",
"Й",
"К",
"Л",
"М",
"Н",
"О",
"П",
"Р",
"С",
"Т",
"У",
"Ф",
"Х",
"Ц",
"Ч",
"Ш",
"Щ",
"Ъ",
"Ы",
"Ь",
"Э",
"Ю",
"Я"
};
But after executing such code, the only string that keeps its value is the first one "!", the rest that are cyrillic characters are substituted by "??" instead.
Any ideas?
Answer by Marzoa · Jan 13, 2015 at 12:57 PM
OK, I finally managed to figure out the problem and solve it. It is clearly another bug more in Unity editor: it does not only want UTF-8 files, but they MUST have the BOM, despite such bytes are optional according to UTF-8 specification. To make things worse, the Mono Develop environment distributed with the same Unity game engine does NOT save UTF-8 with the BOM, so I finally ended up adding it manually just to try and it worked.
Just three steps in OSX command line:
cp KeyboardRussian.cs aux
echo -ne '\xEF\xBB\xBF' > KeyboardRussian.cs
cat aux >> KeyboardRussian.cs
And it worked like charm.
It should work for Windows too with a minor change:
copy KeyboardRussian.cs aux
echo -ne '\xEF\xBB\xBF' > KeyboardRussian.cs
type aux >> KeyboardRussian.cs
But I haven't tried it.
Well, it help Unity to know if the files were created little or big endian, and the BO$$anonymous$$ helps that. I typically use TextWrangler, and can trivially create scripts with a BO$$anonymous$$. I just tried your code in TextWrangler using UTF16 Little-Endian and it just worked.
If you're convinced this is a bug in the editor, then please submit a bug report.
From what I can tell, Unity creates scripts for you in UTF-8 with a BO$$anonymous$$. $$anonymous$$D creates files without a BO$$anonymous$$. $$anonymous$$ost applications can then read files, with or without a BO$$anonymous$$, including Unity. $$anonymous$$onoDevelop has a wide range of additional file formats, so add UTF-16BE or UTF-16LE if you need them. Since your script has characters that need to be encoded since they are outside the default (7-bit) Unicode range I guess a BO$$anonymous$$ is helpful.
From wikipedia: "The Unicode Standard permits the BO$$anonymous$$ in UTF-8,[2] but does not require or recommend its use.[3] Byte order has no meaning in UTF-8,[4] so its only use in UTF-8 is to signal at the start that the text stream is encoded in UTF-8."
On the other hand TextWrangler does NOT come with Unity. $$anonymous$$onoDevelop DOES. So they distribute a bundle in which the graphical editor demands BO$$anonymous$$, while their code editor will not generate it. If that incoherence is not a bug, it is a quite idiotic feature...
I am not going to submit a bug report, though. This problem has been around for YEARS and they didn't show any interest on solving it. On the other hand I paid a lot of money for this game engine –something about I deeply regret–, so I am not going to make their beta-testing work for free. This is an offtopic anyway, but now that you mentioned it...
The question is solved.
Answer by zharik86 · Jan 12, 2015 at 07:39 PM
For example, create xml file and add it your symbols:
<RussianSymbols>
<symbol>А</symbol>
<symbol>Б</symbol>
... and etc
</RussianSymbols>
Than in code, for example, in function Awake() read xml and create from this xml your array "rawKeys". I hope that it will help you.
I thought about doing something like that, since I am using some JSON data in other part of the game that has cyrillic chars and it works flawlessly, but this looks like a workaround more than a definitely solution.
I am porting an Android "native" (I mean made in Java) game to Unity, and I have these same strings hardcoded in the Java code without problems at all. I simply don't understand why I cannot do the same within a C# script that it is actually saved as UTF8 too.
@$$anonymous$$arzoa First, try to save you C# script as utf-16. I check russian language in Unity scripts as utf-16 in $$anonymous$$ono. All works: PC and Android.
Not always turns out to import projects from java (I think from Eclipse) to Unity one to one. For example, in java there is language xml file. In it it is possible to use expression of "A\nB". And in java you will see transition to a new line. But in Unity it will be shown on the A\nB screen. There is a lot of such distinctions.
Second way from Unity forum:
File > Save As...
Character Coding: "Western ISO-8859-15" or whatever coding which does not support your strings characters
Click "Save" button
Click "Overwrite File" button
Click "Save as Unicode" button
Your file is correctly saved as UTF-8 for Unity! Now, even if you save the file again with the Save Ctrl+S command, the script will correctly show the good characters in the editor.
@zharik86 I had found that solution in the forums by myself, but that trick didn't work for me. It didn't even asked to save the file as Unicode. I bet there is a checkbox for not asking that again, and as far as I always save everything as Unicode, I had probably checked it months ago.
Unicode isn't a file format. It's a way of representing characters. BO$$anonymous$$ is useful to indicate that a text file is stored in UTF-8 or UTF-16.
Answer by code_warrior · Jan 12, 2015 at 08:27 PM
Hi,
this is an easy example of an easy encoding of cyrillic characters.
string cyrillicText = "Ж";
System.Text.UTF8Encoding encodingUnicode = new System.Text.UTF8Encoding();
byte[] cyrillicTextByte = encodingUnicode.GetBytes(cyrillicText);
Debug.Log(encodingUnicode.GetString(cyrillicTextByte));
What it is actually doing is that I am specifying the text to encode and Iam creating an variable for my encoding type (in this case we need UTF8). Afterwards I am storing the text as a byte array and print it in the console.
In your particular case (Index 24) of rawKeys array:
System.Text.UTF8Encoding encodingUnicode = new System.Text.UTF8Encoding();
byte[] cyrillicTextByte = encodingUnicode.GetBytes(rawKeys[24]);
Debug.Log(encodingUnicode.GetString(cyrillicTextByte));
Returns:
code_warrior
Your answer
Follow this Question
Related Questions
How do I convert from unicode to surrogate pairs? 2 Answers
Is it possible to set -codepage compiler flag? 0 Answers
Special characters from Web 1 Answer
How to encode an image to a base64 string? 3 Answers
Finding a character unicode in C# 0 Answers