Regex: Match the text before the end of line
Asked Answered
E

3

0

I have a file that looks like this:

J6      INT-00113G  227.905    5.994  180  ~!@#$%&^)
J3      INT-00113G  227.905 -203.244  180  12341341312315
U13     EXCLUDES    -42.210  181.294  180  QFP128
U3      IC-00276G     5.135  198.644  90   B%GA!@-48
U12     IC-00270G  -123.610 -201.594  0    SOP8_000
J1      INT-00112G  269.665  179.894  180  SOIC16_1
J2      INT-00112G  269.665  198.144  180  SOIC16-_2
..      ..........  .......  .......  ...  ................

And I would like to match the end value in the 6th column in order to remove it from a list. The length of the value in the 6th column is undetermined and can contain any character. So what I would like to do is match the end value before a space. or just the end of the line.


CODE:

        // Reads the lines in the file to format.
        var fileReader = File.OpenText(filePath + "\\Remove Package 1 Endings.txt");

        // Creates a list for the lines to be stored in.
        var fileList = new List<string>();

        // Adds each line in the file to the list.
        while (true)
        {
            var line = fileReader.ReadLine();
            if (line == null)
                break;

            fileList.Add(line);
        }

        var mainResult = new List<string>();
        var theResult = new List<string>();

        foreach (var mainLine in fileList)
            mainResult.Add(string.Join(" ", mainLine));

        foreach (var theLine in mainResult)
        {
            // PLACEMENT ONE Regex
            Match theRegex = Regex.Match(theLine, @"insert the regex here!");

            if (theRegex.Success)
                theResult.Add(string.Join(" ", theLine));
        }

        // Removes the matched values from both of the Regex used above.
        List<string> userResult = mainResult.Except(theResult).ToList();

        // Prints the proper values into the assigned RichTextBoxes.
        foreach (var line in userResult)
            richTextBox2.AppendText(line + "\n");

What I am trying to do is get the file to look like this:

J6      INT-00113G  227.905    5.994  180
J3      INT-00113G  227.905 -203.244  180
U13     EXCLUDES    -42.210  181.294  180
U3      IC-00276G     5.135  198.644  90
U12     IC-00270G  -123.610 -201.594  0
J1      INT-00112G  269.665  179.894  180
J2      INT-00112G  269.665  198.144  180

QUESTION:

  • Can anyone help come up with a regex for this?

EDIT:

ADDED CODE:

        var lines = new List<string>(File.ReadAllLines(filePath + "\\Remove Package 1 Endings.txt"));
        for (int i = 0; i < lines.Count; i++)
        {
            var idx = lines[i].LastIndexOf(" ");

            if (idx != -1)
                lines[i] = lines[i].Remove(idx);

            richTextBox1.AppendText(lines[i] + Environment.NewLine
        }
Epigenesis answered 25/7, 2011 at 21:3 Comment(4)
you have a tabular format and want to remove the last column? why would you want to use a regex for this? i don't know c# (which i believe this is), but surely it must have a string split method..Disgrace
@mhyfritz: could I do something like this: foreach (var theLine in mainResult) {theLine.Split(' '); theResult.Add(string.Join(theLine[0] + theLine[1] + theLine[2] + theLine[3] + theLine[4]));}?Epigenesis
as mentioned, i don't know c#, so you're on your own with syntax -- but yeah, just split every line by space ' ' and discard the last field.Disgrace
"you have a tabular format and want to remove the last column? why would you want to use a split for this? i don't know c# (which i believe this is), but surely it must have indexof and substr methods"Reversioner
J
1

\S+$ should do it, with multiline functionality enabled. (Not sure how exactly you enable regex flags in... C#, is it?, but prepending (?m) to the string works with some regex engines, though it's not the only way to do it.).

\S - matches any non-whitespace character
+ - indicates that the preceding regex element should be matched one or more times
$ - indicates matching to the end of the string, or end of a line if multiline is enabled.

EDIT: You're checking each line individually, so no need to worry about multiline stuff.

(Though as stated by others, going with regex for this is probably making things more complicated than necessary.)

Justitia answered 25/7, 2011 at 21:8 Comment(1)
Thanks.. Yeah I think I realize that now.. :)Epigenesis
C
1

I think that you're making this more complex than it really is; for instance, the following should help you removing the last part of the data if formatted as per your example, with a little tweaking, such as trimming (and, obviously, error mitigation), I'm sure this would suit:

var lines = new List<string>(File.ReadAllLines(path));
for (int i = 0; i < lines.Count; i++) 
{
    var idx = lines[i].LastIndexOf(" ");   
    if (idx != -1)
    {     
        lines[i] = lines[i].Remove(idx);
    }
}

Note that it is possible to read all lines of a file in one fell swoop, this isn't always desired depending on the size of the file to be loaded, but I see you're loading each of the lines anyway before processing - in which case we can just make the whole thing more concise.

Cedar answered 25/7, 2011 at 21:10 Comment(7)
I get an error stating: "StartIndex cannot be less than zero. Parameter name: startIndex" on the line lines[i] = lines[i].Remove(idx);. I also added the line richTextBox1.AppendText(lines + "\n"); to your code above..Epigenesis
@Colton Yes, proper checking would be required to prevent error; in this case, it would be that idx != -1.Cedar
@Colton Furthermore, lines + "\n" won't really work, since lines is a List<string>, you use lines[i]. Better yet, use lines[i] + Environment.NewLine.Cedar
I updated my code above using an if statement.. However, now the richTextBox1 reads "System.Collections.Generic.List`1[System.String]". Do you know how to fix this?Epigenesis
Your lines[i] + Environment.NewLine works... However, I am still getting the endings that were to be removed showing upEpigenesis
@Colton Do the loaded lines end with an empty space? If so, or maybe just to be safe, use lines[i].Trim() before removal, this should remove leading and trailing spaces.Cedar
Yes the lines end with empty space(s). I tried the lines[i].Trim(); before the removeal and it still does not seem to be working properly.Epigenesis
C
0

Just relying on the fact that each column is separated by spaces you could use:

\s+([\S]*)$
Carthusian answered 25/7, 2011 at 21:8 Comment(1)
Thanks, but I think I am going to try a simpler way then REGEX.Epigenesis

© 2022 - 2024 — McMap. All rights reserved.