- Home /
The question is answered, right answer was accepted
WWW.text string not correct?
Hi there, I've been hitting my head against the wall for a few weeks now because of some weird behaviour I can't quite figure out.
My webplayer has to read a JSON data file from a folder (inside the same server, in a subfolder, so the security sandbox doesn't complain) getting the file via WWW class and using the JSONObject class to parse the data. I've coded it so in the case of the file not being available, it reads some default data from a JSON file from the resources folder. The thing is that when I use the WWW.text property to access the data from the downloaded file, the JSONObject class can't read it, while whenever it reads the (very same) data from the file in the resources folder (using TextAsset.text property) there's no problem at all. Both files are UTF-8 encoded.
I've narrowed the problem down to the point of WWW.text property vs TextAsset.text property reading files UTF-8 encoded and returning differently encoded strings (or something similar, because I'm lost). The docs on WWW.text say that the contents of the web page must be in UTF-8 or ASCII character set, which they are, but still, the string that I get using WWW.text can't be read in the parser. The manual on TextAsset says that it can read .json files (which it does), so I really think that the problem may come from reading a UTF-8 encoded file with the WWW class.
The (updated) code:
WWW myWWW = new WWW(Application.dataPath + JSONurl);
yield return myWWW;
string jsonData = "";
if (myWWW.error == null) {
jsonData = myWWW.text;
} else {
TextAsset myData = (TextAsset)Resources.Load(JSONFile, typeof(TextAsset));
jsonData = myData.text;
}
JSONObject json = new JSONObject(jsonData);
Any ideas on why WWW.text could be returning a wrong string while reading from a UTF-8 encoded file?
Thanks in advance.
[UPDATE] I've updated the code and done some more tests. The string that receives the data from the WWW.text displays the contents of the file correctly, no matter the encoding of the file (ANSI or UTF-8). Looks like the string returned by WWW.Text and TextAsset.text might be in different encodings and the JSONOBject only accepts the encoding from the TextAsset.text string. I'll keep working on it.
Also note that I use JSONObject and if the file downloaded is ANSI encoded, it is read properly (but JSON files must be UTF-8 encoded and the file I read is UTF-8 encoded).
Hi Andres.... I would like to know what JSON parser are you using. I have the exact same code as of yours and SimpleJSON as my JSON parser ... which simply works without any hassle.
So, I tried changing the encoding of the WWW response, since I know it'll be UTF-8 encoded:
WWW myWWW = new WWW(Application.dataPath + JSONurl);
yield return myWWW;
string jsonData = "";
if (myWWW.error == null) {
Encoding utf8 = Encoding.UTF8;
Encoding ansi = Encoding.Default;
byte[] utf8bytes = utf8.GetBytes(myWWW.text);
byte[] ansibytes = Encoding.Convert(utf8, ansi, utf8bytes);
char[] ansichars = new char[ansi.GetCharCount(ansibytes, 0, ansibytes.Length)];
ansi.GetChars(ansibytes, 0, ansibytes.Length, ansichars, 0);
jsonData = new string(ansichars);
} else {
TextAsset myData = (TextAsset)Resources.Load(JSONFile, typeof(TextAsset));
jsonData = myData.text;
}
JSONObject json = new JSONObject(jsonData);
Still no luck. The string reads the contents perfectly, but parser doesn't like the string it receives if it comes from an UTF-8 encoded file.
I'm having a similar problem. I went through all the possible encoders (though utf8 should have done the trick) without any luck. Didn't even load a JSON string in the end, but a simple "Hello World". Decoding the byte array in bulk results in a single character (e.g. "<" or "É" - mostly �). Grabbing every byte on its own and printing it to the console shows a lot of fancy characters, none of which resemble the source remotely.
Since i "solved" the problem by rolling back to Unity 4.3 i can only guess the problem lies hidden somewhere in the WWW class.
i still hope i am missing something - hate to miss out on the new ui =)
Answer by Andres-Fernandez · Dec 03, 2014 at 08:22 AM
Finally the solution came from a colleague of mine, who told me that while debugging he noticed that the first three bytes of the WWW.text property were not readable characters. And yes, they were EF BB BF, i.e. the UTF-8 BOM. His workaround (that works at least on webplayer, I haven't checked other platforms) is as simple as not reading the first three bytes, since it seems that the WWW class includes the BOM in the WWW.text string:
WWW myWWW = new WWW(Application.dataPath + JSONurl); // UTF-8 encoded json file on the server
yield return myWWW;
string jsonData = "";
if (string.IsNullOrEmpty(myWWW.error)) {
jsonData = System.Text.Encoding.UTF8.GetString(myWWW.bytes, 3, myWWW.bytes.Length - 3); // Skip thr first 3 bytes (i.e. the UTF8 BOM)
JSONObject json = new JSONObject(jsonData); // JSONObject works now
}
Haven't checked with other parser or any other situations, but if any of you is experiencing problems with WWW.text you may want to look at the BOM inside your strings.
Obviously, if the file is saved without BO$$anonymous$$, you can read WWW.text directly. In my case, I check for the first 3 bytes of WWW.bytes. If they match EF BB BF then I skip them (since a JSON file can't start with those bytes), if they don't match then I get the whole WWW.text directly.
This solution saved my day! I was wondering why my decryption wasn't working and before that my JSON object convert was giving an error. Now both works, thx so much!
YES!
This solution totally worked. Thank you Andres! I've had like two dozen browser windows open for the last four days hunting for a solution to this problem, and this is the only correct solution out there. Totally strange, as this just seems like a very basic bug devs would be running across all the time.
Anyway, thank you for posting this Andres! I've shared this link both in another Unity forums post as well as Stackoverflow.
BTW, I'm not using any additional plug-ins to work with the JSON (as I'm seeing pretty consistently referenced). I'm just parsing and working with the object in native Unity code via JsonUtility.FromJson
I have solved this problem.
Please use "JSONObject" from asset store which is available free.
Download from here : https://assetstore.unity.com/packages/tools/input-management/json-object-710
WWW myWWW = new WWW(Application.dataPath + JSONurl); // UTF-8 encoded json file on the server yield return myWWW; string jsonData = ""; if (string.IsNullOrEmpty(myWWW.error)) { jsonData = System.Text.Encoding.UTF8.GetString(myWWW.bytes, 3, myWWW.bytes.Length - 3); // Skip thr first 3 bytes (i.e. the UTF8 BO$$anonymous$$) JSONObject json = new JSONObject(jsonData); // JSONObject works now
Debug.Log(json.ToString());
}
// Now you use this json.ToString() in your json deserialization. and can't display null reference error ,
Answer by unimechanic · Dec 02, 2014 at 01:42 PM
I'm using JSONObject from the asset store.
but parser doesn't like the string it receives if it comes from an UTF-8 encoded file
You might get a faster answer by getting in contact with the author. Probably they know whether this is a limitation of the plugin, or could provide a solution.
Follow this Question
Related Questions
Is it possible to set -codepage compiler flag? 0 Answers
Get Json with UTF-8 encode try to convert to decoded UTF-8 String Array 2 Answers
How do I convert from unicode to surrogate pairs? 2 Answers
Problems with Russian characters within strings (C#) 3 Answers
XmlException: Text node cannot appear in this state 2 Answers