String Split by Length and split only by nearest space
Asked Answered
B

6

5

I am having a Text Like

var data = "âô¢¬ôè÷¢ : ªîø¢è¤ô¢ - ã¿ñ¬ô ñèù¢ ªð¼ñ£÷¢ ï¤ôñ¢,«ñø¢è¤ô¢ - ªð¼ñ£÷¢ ñèù¢ ÝÁºèñ¢ ï¤ô袰ñ¢ ñ¤ì¢ì£ Üò¢òñ¢ ªð¼ñ£ñ¢ð좮 è¤ó£ñ âô¢¬ô袰ñ¢,õìè¢è¤ô¢ - ÝÁºèñ¢ ï¤ôñ¢,è¤öè¢è¤ô¢ - ô좲ñ¤ ï¤ôñ¢ ñø¢Áñ¢ 1,22 ªê ï¤ôñ¢ ð£î¢î¤òñ¢";

and I am Having the Extension Method to split string

public static IEnumerable<string> EnumByLength(this string s, int length)
{
    for (int i = 0; i < s.Length; i += length)
    {
        if (i + length <= s.Length)
        {
            yield return s.Substring(i, length);
        }
        else
        {
            yield return s.Substring(i);
        }
    }
}
public static string[] SplitByLength(this string s, int maxLen)
{
    var v = EnumByLength(s, maxLen);
    if (v == null)
        return new string[] { s };
    else
        return s.EnumByLength(maxLen).ToArray();
}

Now my question is

To split this string by Maximum Length 150 and the splitting must be done only by the Nearest Spaces in it..(either before 150 or after 150.. not in the middle of a word.

How?

Blinders answered 23/8, 2013 at 6:26 Comment(4)
so you want to .Split(' ') a string based on spaces?, (it would help to clarify where a space is in a word)Salpingotomy
Moreover the Split has to performed only after the String Index 150.. Was I asked correct???Blinders
This problem should be solved with traditional/standard while and for loop, why LINQ?Licketysplit
oh.. Thought of using Extension Methods.. Thats why... So How to Solve it?? Please Help... @KingKingBlinders
E
5

My version:

// Enumerate by nearest space
// Split String value by closest to length spaces
// e.g. for length = 3 
// "abcd efghihjkl m n p qrstsf" -> "abcd", "efghihjkl", "m n", "p", "qrstsf" 
public static IEnumerable<String> EnumByNearestSpace(this String value, int length) {
  if (String.IsNullOrEmpty(value))
    yield break;

  int bestDelta = int.MaxValue;
  int bestSplit = -1;

  int from = 0;

  for (int i = 0; i < value.Length; ++i) {
    var Ch = value[i];

    if (Ch != ' ')
      continue;

    int size = (i - from);
    int delta = (size - length > 0) ? size - length : length - size;

    if ((bestSplit < 0) || (delta < bestDelta)) {
      bestSplit = i;
      bestDelta = delta;
    }
    else {
      yield return value.Substring(from, bestSplit - from);

      i = bestSplit;

      from = i + 1;
      bestSplit = -1;
      bestDelta = int.MaxValue;
    }
  }

  // String's tail
  if (from < value.Length) {
    if (bestSplit >= 0) {
      if (bestDelta < value.Length - from)
        yield return value.Substring(from, bestSplit - from);

      from = bestSplit + 1;
    }

    if (from < value.Length)
      yield return value.Substring(from);
  }
}

...

var list = data.EnumByNearestSpace(150).ToList();
Ebbarta answered 23/8, 2013 at 6:54 Comment(2)
I found a problem with the String's tail the line from = bestSplit + 1; should be within the if statement block above. Example Console.WriteLine(string.Join("#", EnumByNearestSpace("Thank you for shopping with us! We really appreciate you!", 40))); will result in appreciate missing.Flaunch
I've same issue resolve by deleting in string's tail the if (bestSplit >= 0) sectionInveigle
P
2

Old topic, but I was just havin the same issue and tried to solve it myself. Here is my approach, it will also throw an error if any word exceeds the current limit.

static void Main(string[] args)
{
    string veryLongText = @"Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua.";

    var result = SplitString(veryLongText, 20);
    if (result != null)
        foreach (var t in result)
            Console.WriteLine($"{t.Length, 3} : '{t}'");

    Console.ReadLine();
}

private static List<string> SplitString(string data, int length)
{
    List<string> result = new List<string>();

    if (data.Split(' ').Any(x => x.Length > length))
    {
        Console.WriteLine("ERROR, SINGLE WORD EXCEED THE CURRENT LIMIT!");
        return null;
    }

    int lastSpace = 0;
    int currentSpace = 0;
    int newLinePos = 0;

    for (int i = 0; i < data.Length; i++)
    {
        if (data.Length - newLinePos <= length)
        {
            result.Add(data.Substring(newLinePos, data.Length - newLinePos));
            break;
        }
        if (data[i] == ' ')
        {
            lastSpace = currentSpace;
            currentSpace = i;
            if (currentSpace - newLinePos > length)
            {
                result.Add(data.Substring(newLinePos, lastSpace - newLinePos));
                newLinePos = lastSpace + 1;
            }
        }
    }

    return result;
}
Paeon answered 29/4, 2021 at 8:53 Comment(0)
S
1

My version

var data = "âô¢¬ôè÷¢ : ªîø¢è¤ô¢ - ã¿ñ¬ô ñèù¢ ªð¼ñ£÷¢ ï¤ôñ¢,«ñø¢è¤ô¢ - ªð¼ñ£÷¢ ñèù¢ ÝÁºèñ¢ ï¤ô袰ñ¢ ñ¤ì¢ì£ Üò¢òñ¢ ªð¼ñ£ñ¢ð좮 è¤ó£ñ âô¢¬ô袰ñ¢,õìè¢è¤ô¢ - ÝÁºèñ¢ ï¤ôñ¢,è¤öè¢è¤ô¢ - ô좲ñ¤ ï¤ôñ¢ ñø¢Áñ¢ 1,22 ªê ï¤ôñ¢ ð£î¢î¤òñ¢";

var indexes = new List<int>();
var lastFoundIndex = 0;
while((lastFoundIndex = data.IndexOf(' ', lastFoundIndex + 1)) != -1)
{
    indexes.Add(lastFoundIndex);
}

int intNum = 150;
int index;
var newList = new List<string>();
while ((index = indexes.Where(x => x > intNum - 150 &&  x <= intNum).LastOrDefault()) != 0)
{
    var firstIndex = newList.Count == 0 ? 0 : index;
    var lastIndex = firstIndex + 150 >= data.Length ? data.Length - 150 : intNum;
    newList.Add(data.Substring(intNum - 150, lastIndex));
    intNum += 150;
}

newList contains the split string

Salpingotomy answered 23/8, 2013 at 7:6 Comment(1)
Tested this with Thank you for shopping with us! We really appreciate you! splitting on 40 characters. It was splitting in the middle of really.Flaunch
F
0

There you go:

 for (int i = 0; i < s.Length; i += length)
    {
        int index=s.IndexOf(" ",i, s.Length-i)

        if (index!=-1 && index + length <= s.Length)
        {
            i =index;           
            yield return s.Substring(index, length);
        }
        else
        {
            index= s.LastIndexOf(" ", 0, i);
            if(index==-1)
                yield return s.Substring(i);
            else
            {
                i = index;
                yield return s.Substring(i);
            }
        }
    }
Forestation answered 23/8, 2013 at 6:40 Comment(3)
AAhh... Unfortunately This doesn't worked. The Words get Repeated in the New line from the last line. sorry...Blinders
@Gokul try now, fixed itForestation
causes Argument Out Of Range ExceptionMultiphase
I
0

My string extension:

public static string TrimAtNearestWhiteSpace(this string src, int pos)
{
    string retval = src;
    if (!string.IsNullOrEmpty(src) && src.Length > pos)
    {
        //get a sorted list of white space indexes
        var whiteSpaceIndexes = new List<int>();
        for (int i = 0; i < src.Length; i++)
            if (src[i] == ' ') whiteSpaceIndexes.Add(i);

        // let the whole source be an option if close to target position
        whiteSpaceIndexes.Add(src.Length); 

        //compare nearest white space positions
        var nextSpace = whiteSpaceIndexes.FirstOrDefault(x => x >= pos);
        whiteSpaceIndexes.Reverse();
        var prevSpace = whiteSpaceIndexes.FirstOrDefault(x => x < pos);
        var bestDelta = nextSpace - pos < pos - prevSpace ? nextSpace : prevSpace;

        //add ellipsis if return value is trimmed
        if(bestDelta < src.Length)
        retval = src.Substring(0, bestDelta) + "...";
    }
    return retval;
}

Usage:

var source = "Lorem ipsum dolor sit amet, consectetur adipiscing elit";
var readmore = source.TrimAtNearestWhiteSpace(6);
Individualism answered 21/2, 2023 at 17:27 Comment(0)
T
-1

Try this, this code will split long sentence into list of lines upto words less than or equal to chunksize :

    private List<string> splitIntoChunks(string toSplit, int chunkSize)
    {
        List<string> splittedLines = new List<string>();

        string [] toSplitAr = toSplit.Split(new char[] { ' ', '\t' }, StringSplitOptions.RemoveEmptyEntries);

        for (int i = 0; i < toSplitAr.Length; )
        {
            string line = "";
            string prefix = "";

            for (int linesize = 0; linesize <= chunkSize;)
            {
                if (i >= toSplitAr.Length) break; //i should not exceed splited array
                prefix = (line == "" ? "" : " "); //prefix with space if not first word in line
                linesize += toSplitAr[i].Length;
                if (linesize > chunkSize) break; //line size should not exceed chunksize
                line += (prefix  + toSplitAr[i]);
                i++;
            }

            splittedLines.Add(line);
        }

        return splittedLines;
    }
Tnt answered 6/5, 2018 at 17:27 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.