- Home /
Split a string every 'n' characters?
I know how to use String.Split(""[0]);
but I need to split a string every 3 characters rather than by a character such as "_". I need the split string to be returned as an array and using Substring would be way too slow (I'm handling lots of data).
How would I do this?
It's easy to do with Substring... Not sure how you'd do it without.
You could maybe convert it to a byte array and hope it was a little faster....
Would it become more difficult to process for me as a byte array?
Answer by Bunny83 · Sep 23, 2011 at 12:03 AM
I doubt that Substring would be any slower, but that's also a way to split the string.
// (I'll assume you use UnityScript? I prefer C# but that's up to you)
function Split(text : String, charCount : int) : String[]
{
if (text.length == 0)
return new String[0];
var arrayLength = (text.length-1) / charCount +1;
var result = new String[arrayLength];
for(var i = 0; i < arrayLength; i++)
{
var tmp = "";
for (var n = 0; n < charCount; n++)
{
var index = i*charCount+n;
if (index >= text.length) //important if last "part" is smaller
break;
tmp += text[index];
}
result[i] = tmp;
}
return result;
}
I'm not sure why you think that Substring is slow? of course you need a lot function calls which might be the biggest problem. I guess this is not that much slower since Substring does something similar (probably even more efficient :D)
function Split(text : String, charCount : int) : String[]
{
if (text.length == 0)
return new String[0];
var arrayLength = (text.length-1) / charCount +1;
var result = new String[arrayLength];
for(var i = 0; i < arrayLength-1; i++)
{
result[i] = text.Substring(i*3,3);
}
result[arrayLength-1] = text.Substring((arrayLength-1)*3);
return result;
}
edit I've forgot to handle the special case when calling the method with an empty string. In that case it would also return one element that is empty due to my arrayLength calculation. I've added a check at the start and return an empty array in that case.
Thanks for the reply Bunny83. I agree, both your methods would work fine. I'm yet to test them exactly but I'm trying to avoid using a for loop because I would have around 32000 loops to run through quite frequently. String.Split() is very fast but I fear that building my own function would be slow in this case.
However I will test both your methods and see if they come out at the same speed. If they do, you will have solved my problem :)
O$$anonymous$$, tested both and compared them against a straight 'String.Split'... unfortunately the first one was 10x slow and the second 2x slow... the second method might work... but with amount I need to do I can't afford the +0.02 seconds it added to the processing time of the operation... If only Unity Javascript supported String.$$anonymous$$atch...
Actually how do you thing String.Split does his job? There's no way around looping through the whole string. Since String.Split comes with $$anonymous$$ono / .Net it's very optimised but it still iterates through all of your 99.614.720 characters (95$$anonymous$$B).
I guess the problem why your own function will never reach the speed of the built-in functions is that you don't have direct memory access. I'm not sure but i guess the string class also uses an indexer myString[5]
which is also a function call. If you do it "manually" you won't get much faster than that.
I'm just a bit confused, why do you need it so fast? If you read the file from disk i guess that's the bottle neck. Usually you do such an operation once at start, so i don't understand the need of such high speed...
Actually I'm going to have to beg for your forgiveness. I did several retests of the function that you have produced comparing its speed to String.Split(). The two functions randomly varied in efficiency around the 0.02secs mark. Which means I would do better to use yours! Thank you very much you've solved the problem. :)
I didn't see your comment above at first, so here's a bit of an explanation:
I'm building a voxel engine http://en.wikipedia.org/wiki/Voxel The concept is to build a planet before runtime and then to only load the portion of the planet the player can see. The planet is too big to have all its data moved through RA$$anonymous$$ when you move around it, so I've split it into chunks. These chunks are then loaded from memory when needed; hence the need for speed.
Answer by Sigil · Sep 23, 2011 at 12:24 AM
How about handling this at the time you load it? If you can get a stream you can do something like:
BinaryReader tReader = new BinaryReader(tMyStream);
string[] tMyStrings = new string[tMyStream.Length / 3];
for (int i = 0; i < tMyStrings.Length; i++)
tMyStrings[i] = new string(tReader.ReadChars(3));
This assumes a few things, but you can probably make it work.
This looks like a promising method, unfortunately I don't know enough about Streams to feel confident with this. How do you get a stream? And what is it?
There are a few ways to get a stream, if you're loading your file as a resource this would probably work:
TextAsset tResource = (TextAsset)Resources.Load(tPathToFile, typeof(TextAsset));
BinaryReader tReader = new BinaryReader(new $$anonymous$$emoryStream(tResource.bytes));
If you're loading a file this would probably work:
BinaryReader tReader = new BinaryReader(new BufferedStream(File.OpenRead(tPathToFile)));
The first case will work in memory, the second case will read from disk but buffer to improve speed.
Answer by yeoldesnake 1 · Sep 22, 2011 at 02:39 PM
If the data you are trying to handle is not entered by a user you could again use Split , and split the data each time a special character is seen. This could be applied of the user also enters the data , just instruct them to split the data by spaces and then you can replace the Space with said character. You could use something like "&" , or if the data you are mentioning uses such characters , you can split by "&split". I apologise for the simplistic solution but i am unaware of anything else.
There might be something but im not sure if it will work. You can try
String.Split(""[3]);
Thanks for the reply, but I intend to specifically ignore this method. I don't want to add a character for splitting because it is increase the size of file storage. I have a 95mb file but I need to read it in small 3 character portions. To solve this for now I have added a "|" and I am splitting at this character. However, with the "|" the file increases to 125mb! Way to much of an increase to be efficient. If I could split by number of characters and not character itself then it would largely save space on the hard drive.
I'm afraid the Split(""[3]) just produces an error too... nice try though.
Answer by Tommynator · Sep 23, 2011 at 11:19 AM
This will read out your file piece by piece and you don't have to load all the 95mb into memory.
string path = "path to file";
System.IO.StreamReader reader = new System.IO.StreamReader(path);
int index = 0;
while (!reader.EndOfStream)
{
char[] buf = new char[3];
reader.ReadBlock(buf, index, 3);
index += 3;
}
Hey, I see what you're attempting to do here... but I can't get my head around the char[3] and ReadBlock bit. It's producing an error saying that index + count > buffer.length. So I tried increasing the buffer length and it didn't work...
I got it to work but unfortunately this method takes 15seconds... as opposed to my current 0.01 seconds... Unless I'm doing something greatly wrong... this is just too slow.
;) that because file opperations are very slow. It's much faster to read in the whole file into memory and work with the memory copy.
The first error appears when your text-file-length is not a multiple of 3. ReadBlock will always read count
characters but if there are only 2 or 1 left it will throw that error.
Good point, much slower to read from disk piecewise ins$$anonymous$$d from memory.
Answer by paulo_renan · Feb 01, 2021 at 01:40 PM
Well, that's how I solved it for my purpose.
using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using UnityEngine;
public static class StringExtensions
{
public static List<string> SplitEveryN(this string str, int n = 1024)
{
List<string> ret = new List<string>();
int chunkIterator = 0;
while (chunkIterator < str.Length)
{
int currentChunkSize = Mathf.Min(n, str.Length - chunkIterator);
ret.Add(str.Substring(chunkIterator, currentChunkSize));
// Debug.Log(str.Substring(chunkIterator, currentChunkSize));
chunkIterator += currentChunkSize;
}
return ret;
}
}
To use it, it's just to call
// default length of 1024
foreach (var stringChunk in myLongString.SplitEveryN())
{
Debug.Log(stringChunk);
}
// custom length
foreach (var stringChunk in myLongString.SplitEveryN(32))
{
Debug.Log(stringChunk);
}
Your answer
Follow this Question
Related Questions
Extract number from string? 3 Answers
StringSplitOptions.RemoveEmptyEntries - Unknown identifier 2 Answers
issues splitting strings 1 Answer
How to split a string into array? 2 Answers
Webplayer String length fails? 1 Answer