Regex to match a path in C#
Asked Answered
D

3

6

I'm new with regular expressions. I need to extract the path from the following lines:

XXXX       c:\mypath1\test
YYYYYYY             c:\this is other path\longer
ZZ        c:\mypath3\file.txt

I need to implement a method that return the path of a given line. The first column is a word with 1 or more characters, never is empty, the second column is the path. The separator could be 1 or more spaces, or one or more tabs, or both.

Denson answered 18/10, 2011 at 9:7 Comment(2)
is the input is a file or lines individually ?Crumpton
yes. the treatment for line and for file is different. unless you read it line by line from the tex file and then youll also need to take care of line breaks chars etc.Crumpton
A
7

It sounds to me like you just want

string[] bits = line.Split(new char[] { '\t', ' ' }, 2,
                           StringSplitOptions.RemoveEmptyEntries);
// TODO: Check that bits really has two entries
string path = bits[1];

(This is assuming that the first column never contains spaces or tabs.)

EDIT: As a regular expression you can probably just do:

Regex regex = new Regex(@"^[^ \t]+[ \t]+(.*)$");

Sample code:

using System;
using System.Text.RegularExpressions;

class Program
{
    static void Main(string[] args)
    {
        string[] lines = 
        {
            @"XXXX       c:\mypath1\test",
            @"YYYYYYY             c:\this is other path\longer",
            @"ZZ        c:\mypath3\file.txt"
        };

        foreach (string line in lines)
        {
            Console.WriteLine(ExtractPathFromLine(line));
        }
    }

    static readonly Regex PathRegex = new Regex(@"^[^ \t]+[ \t]+(.*)$");

    static string ExtractPathFromLine(string line)
    {
        Match match = PathRegex.Match(line);
        if (!match.Success)
        {
            throw new ArgumentException("Invalid line");
        }
        return match.Groups[1].Value;
    }    
}
Arpeggio answered 18/10, 2011 at 9:12 Comment(3)
Paths can have spaces, so the second one is quite bad.Senegambia
@Jon: Sorry, I need a regular expresion since I'm using .NET 1.1 and I have no access to StringSplitOptions.RemoveEmptyEntries overload. Thanks anyway!Mesomorphic
@DanielPeñalba: It would have been useful to say so to start with - requiring .NET 1.1 is very rare these days. Will edit.Arpeggio
F
5
StringCollection resultList = new StringCollection();
try {
    Regex regexObj = new Regex(@"(([a-z]:|\\\\[a-z0-9_.$]+\\[a-z0-9_.$]+)?(\\?(?:[^\\/:*?""<>|\r\n]+\\)+)[^\\/:*?""<>|\r\n]+)");
    Match matchResult = regexObj.Match(subjectString);
    while (matchResult.Success) {
        resultList.Add(matchResult.Groups[1].Value);
        matchResult = matchResult.NextMatch();
    } 
} catch (ArgumentException ex) {
    // Syntax error in the regular expression
}

Breakdown :

@"
(                             # Match the regular expression below and capture its match into backreference number 1
   (                             # Match the regular expression below and capture its match into backreference number 2
      |                             # Match either the regular expression below (attempting the next alternative only if this one fails)
         [a-z]                         # Match a single character in the range between “a” and “z”
         :                             # Match the character “:” literally
      |                             # Or match regular expression number 2 below (the entire group fails if this one fails to match)
         \\                            # Match the character “\” literally
         \\                            # Match the character “\” literally
         [a-z0-9_.$]                   # Match a single character present in the list below
                                          # A character in the range between “a” and “z”
                                          # A character in the range between “0” and “9”
                                          # One of the characters “_.$”
            +                             # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
         \\                            # Match the character “\” literally
         [a-z0-9_.$]                   # Match a single character present in the list below
                                          # A character in the range between “a” and “z”
                                          # A character in the range between “0” and “9”
                                          # One of the characters “_.$”
            +                             # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
   )?                            # Between zero and one times, as many times as possible, giving back as needed (greedy)
   (                             # Match the regular expression below and capture its match into backreference number 3
      \\                            # Match the character “\” literally
         ?                             # Between zero and one times, as many times as possible, giving back as needed (greedy)
      (?:                           # Match the regular expression below
         [^\\/:*?""<>|\r\n]             # Match a single character NOT present in the list below
                                          # A \ character
                                          # One of the characters “/:*?""<>|”
                                          # A carriage return character
                                          # A line feed character
            +                             # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
         \\                            # Match the character “\” literally
      )+                            # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
   )
   [^\\/:*?""<>|\r\n]             # Match a single character NOT present in the list below
                                    # A \ character
                                    # One of the characters “/:*?""<>|”
                                    # A carriage return character
                                    # A line feed character
      +                             # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
"
Flaxman answered 18/10, 2011 at 9:12 Comment(3)
This looks very complicated to basically get everything after the first set of spaces/tabs.Arpeggio
@JonSkeet I agree. That's a more general regex for windows path.Flaxman
@Flaxman it doesn't work for example for "k:\test\test". If I try to pass path like \\test\t><*st it will be valid. I found this regex ^(?:[c-zC-Z]\:|\\)(\\[a-zA-Z_\-\s0-9\.]+)+. It validates path correctly on my opinion. Found it hereSubedit
M
0

Regex Tester is a good Website to test the Regex fast.

Regex.Matches(input, "([a-zA-Z]*:[\\[a-zA-Z0-9 .]*]*)");
Mixup answered 18/10, 2011 at 9:24 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.