Wayback Machinekoobas.hobune.stream
May JUN Jul
Previous capture 14 Next capture
2021 2022 2023
2 captures
13 Jun 22 - 14 Jun 22
sparklines
Close Help
  • Products
  • Solutions
  • Made with Unity
  • Learning
  • Support & Services
  • Community
  • Asset Store
  • Get Unity

UNITY ACCOUNT

You need a Unity Account to shop in the Online and Asset Stores, participate in the Unity Community and manage your license portfolio. Login Create account
  • Blog
  • Forums
  • Answers
  • Evangelists
  • User Groups
  • Beta Program
  • Advisory Panel

Navigation

  • Home
  • Products
  • Solutions
  • Made with Unity
  • Learning
  • Support & Services
  • Community
    • Blog
    • Forums
    • Answers
    • Evangelists
    • User Groups
    • Beta Program
    • Advisory Panel

Unity account

You need a Unity Account to shop in the Online and Asset Stores, participate in the Unity Community and manage your license portfolio. Login Create account

Language

  • Chinese
  • Spanish
  • Japanese
  • Korean
  • Portuguese
  • Ask a question
  • Spaces
    • Default
    • Help Room
    • META
    • Moderators
    • Topics
    • Questions
    • Users
    • Badges
  • Home /
avatar image
0
Question by Marzoa · Jan 12, 2015 at 04:57 PM · stringencodingutf-8

Problems with Russian characters within strings (C#)

I am trying to create an array of strings that will contain Russian characters, this way:

             rawKeys = new string[] {
             "!",
             "А", 
             "Б", 
             "В", 
             "Г", 
             "Д", 
             "Е", 
             "Ё", 
             "Ж", 
             "З", 
             "И", 
             "Й", 
             "К", 
             "Л", 
             "М", 
             "Н", 
             "О", 
             "П", 
             "Р", 
             "С", 
             "Т", 
             "У", 
             "Ф", 
             "Х", 
             "Ц", 
             "Ч", 
             "Ш", 
             "Щ", 
             "Ъ", 
             "Ы", 
             "Ь",
             "Э", 
             "Ю", 
             "Я"
         };

But after executing such code, the only string that keeps its value is the first one "!", the rest that are cyrillic characters are substituted by "??" instead.

Any ideas?

Comment
Add comment
10 |3000 characters needed characters left characters exceeded
▼
  • Viewable by all users
  • Viewable by moderators
  • Viewable by moderators and the original poster
  • Advanced visibility
Viewable by all users

3 Replies

· Add your reply
  • Sort: 
avatar image
1
Best Answer

Answer by Marzoa · Jan 13, 2015 at 12:57 PM

OK, I finally managed to figure out the problem and solve it. It is clearly another bug more in Unity editor: it does not only want UTF-8 files, but they MUST have the BOM, despite such bytes are optional according to UTF-8 specification. To make things worse, the Mono Develop environment distributed with the same Unity game engine does NOT save UTF-8 with the BOM, so I finally ended up adding it manually just to try and it worked.

Just three steps in OSX command line:

 cp KeyboardRussian.cs aux
 echo -ne '\xEF\xBB\xBF' > KeyboardRussian.cs
 cat aux >> KeyboardRussian.cs

And it worked like charm.

It should work for Windows too with a minor change:

 copy KeyboardRussian.cs aux
 echo -ne '\xEF\xBB\xBF' > KeyboardRussian.cs
 type aux >> KeyboardRussian.cs

But I haven't tried it.

Comment
Add comment · Show 4 · Share
10 |3000 characters needed characters left characters exceeded
▼
  • Viewable by all users
  • Viewable by moderators
  • Viewable by moderators and the original poster
  • Advanced visibility
Viewable by all users
avatar image Graham-Dunnett ♦♦ · Jan 13, 2015 at 02:57 PM 0
Share

Well, it help Unity to know if the files were created little or big endian, and the BO$$anonymous$$ helps that. I typically use TextWrangler, and can trivially create scripts with a BO$$anonymous$$. I just tried your code in TextWrangler using UTF16 Little-Endian and it just worked.

avatar image Graham-Dunnett ♦♦ · Jan 13, 2015 at 02:57 PM 0
Share

If you're convinced this is a bug in the editor, then please submit a bug report.

avatar image Graham-Dunnett ♦♦ · Jan 13, 2015 at 03:22 PM 0
Share

From what I can tell, Unity creates scripts for you in UTF-8 with a BO$$anonymous$$. $$anonymous$$D creates files without a BO$$anonymous$$. $$anonymous$$ost applications can then read files, with or without a BO$$anonymous$$, including Unity. $$anonymous$$onoDevelop has a wide range of additional file formats, so add UTF-16BE or UTF-16LE if you need them. Since your script has characters that need to be encoded since they are outside the default (7-bit) Unicode range I guess a BO$$anonymous$$ is helpful.

avatar image Marzoa · Jan 13, 2015 at 03:44 PM 0
Share

From wikipedia: "The Unicode Standard permits the BO$$anonymous$$ in UTF-8,[2] but does not require or recommend its use.[3] Byte order has no meaning in UTF-8,[4] so its only use in UTF-8 is to signal at the start that the text stream is encoded in UTF-8."

On the other hand TextWrangler does NOT come with Unity. $$anonymous$$onoDevelop DOES. So they distribute a bundle in which the graphical editor demands BO$$anonymous$$, while their code editor will not generate it. If that incoherence is not a bug, it is a quite idiotic feature...

I am not going to submit a bug report, though. This problem has been around for YEARS and they didn't show any interest on solving it. On the other hand I paid a lot of money for this game engine –something about I deeply regret–, so I am not going to make their beta-testing work for free. This is an offtopic anyway, but now that you mentioned it...

The question is solved.

avatar image
0

Answer by zharik86 · Jan 12, 2015 at 07:39 PM

For example, create xml file and add it your symbols:

  <RussianSymbols>
   <symbol>А</symbol>
   <symbol>Б</symbol>
   ... and etc
  </RussianSymbols>

Than in code, for example, in function Awake() read xml and create from this xml your array "rawKeys". I hope that it will help you.

Comment
Add comment · Show 4 · Share
10 |3000 characters needed characters left characters exceeded
▼
  • Viewable by all users
  • Viewable by moderators
  • Viewable by moderators and the original poster
  • Advanced visibility
Viewable by all users
avatar image Marzoa · Jan 12, 2015 at 09:54 PM 0
Share

I thought about doing something like that, since I am using some JSON data in other part of the game that has cyrillic chars and it works flawlessly, but this looks like a workaround more than a definitely solution.

I am porting an Android "native" (I mean made in Java) game to Unity, and I have these same strings hardcoded in the Java code without problems at all. I simply don't understand why I cannot do the same within a C# script that it is actually saved as UTF8 too.

avatar image zharik86 · Jan 13, 2015 at 07:45 AM 1
Share

@$$anonymous$$arzoa First, try to save you C# script as utf-16. I check russian language in Unity scripts as utf-16 in $$anonymous$$ono. All works: PC and Android.

Not always turns out to import projects from java (I think from Eclipse) to Unity one to one. For example, in java there is language xml file. In it it is possible to use expression of "A\nB". And in java you will see transition to a new line. But in Unity it will be shown on the A\nB screen. There is a lot of such distinctions.

Second way from Unity forum:

  1. File > Save As...

  2. Character Coding: "Western ISO-8859-15" or whatever coding which does not support your strings characters

  3. Click "Save" button

  4. Click "Overwrite File" button

  5. Click "Save as Unicode" button

  6. Your file is correctly saved as UTF-8 for Unity! Now, even if you save the file again with the Save Ctrl+S command, the script will correctly show the good characters in the editor.

avatar image Marzoa · Jan 13, 2015 at 03:27 PM 0
Share

@zharik86 I had found that solution in the forums by myself, but that trick didn't work for me. It didn't even asked to save the file as Unicode. I bet there is a checkbox for not asking that again, and as far as I always save everything as Unicode, I had probably checked it months ago.

avatar image Graham-Dunnett ♦♦ · Jan 13, 2015 at 10:08 PM 0
Share

Unicode isn't a file format. It's a way of representing characters. BO$$anonymous$$ is useful to indicate that a text file is stored in UTF-8 or UTF-16.

avatar image
0

Answer by code_warrior · Jan 12, 2015 at 08:27 PM

Hi,

this is an easy example of an easy encoding of cyrillic characters.

 string cyrillicText = "Ж";
 System.Text.UTF8Encoding encodingUnicode = new System.Text.UTF8Encoding();
 byte[] cyrillicTextByte = encodingUnicode.GetBytes(cyrillicText);
 Debug.Log(encodingUnicode.GetString(cyrillicTextByte));

What it is actually doing is that I am specifying the text to encode and Iam creating an variable for my encoding type (in this case we need UTF8). Afterwards I am storing the text as a byte array and print it in the console.

In your particular case (Index 24) of rawKeys array:

 System.Text.UTF8Encoding encodingUnicode = new System.Text.UTF8Encoding();
 byte[] cyrillicTextByte = encodingUnicode.GetBytes(rawKeys[24]);
 Debug.Log(encodingUnicode.GetString(cyrillicTextByte));

Returns:

alt text

code_warrior


untitled.jpg (13.3 kB)
Comment
Add comment · Share
10 |3000 characters needed characters left characters exceeded
▼
  • Viewable by all users
  • Viewable by moderators
  • Viewable by moderators and the original poster
  • Advanced visibility
Viewable by all users

Your answer

Hint: You can notify a user about this post by typing @username

Up to 2 attachments (including images) can be used with a maximum of 524.3 kB each and 1.0 MB total.

Follow this Question

Answers Answers and Comments

26 People are following this question.

avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image

Related Questions

How to display khmer font in unity 0 Answers

How can I convert UTF8 string to arabic? 2 Answers

XmlException: Text node cannot appear in this state 2 Answers

WWW object has doesnt return the full string 2 Answers

UTF-8 characters on Android build 1 Answer


Enterprise
Social Q&A

Social
Subscribe on YouTube social-youtube Follow on LinkedIn social-linkedin Follow on Twitter social-twitter Follow on Facebook social-facebook Follow on Instagram social-instagram

Footer

  • Purchase
    • Products
    • Subscription
    • Asset Store
    • Unity Gear
    • Resellers
  • Education
    • Students
    • Educators
    • Certification
    • Learn
    • Center of Excellence
  • Download
    • Unity
    • Beta Program
  • Unity Labs
    • Labs
    • Publications
  • Resources
    • Learn platform
    • Community
    • Documentation
    • Unity QA
    • FAQ
    • Services Status
    • Connect
  • About Unity
    • About Us
    • Blog
    • Events
    • Careers
    • Contact
    • Press
    • Partners
    • Affiliates
    • Security
Copyright © 2020 Unity Technologies
  • Legal
  • Privacy Policy
  • Cookies
  • Do Not Sell My Personal Information
  • Cookies Settings
"Unity", Unity logos, and other Unity trademarks are trademarks or registered trademarks of Unity Technologies or its affiliates in the U.S. and elsewhere (more info here). Other names or brands are trademarks of their respective owners.
  • Anonymous
  • Sign in
  • Create
  • Ask a question
  • Spaces
  • Default
  • Help Room
  • META
  • Moderators
  • Explore
  • Topics
  • Questions
  • Users
  • Badges