Regular expression to extract part of a file path using the logstash grok filter

Asked 23/11, 2012 at 5:20 Answered 21/1, 2016 at 6:28

I am new to regular expressions but I think people here may give me valuable inputs. I am using the logstash grok filter in which I can supply only regular expressions.

I have a string like this

/app/webpf04/sns882A/snsdomain/logs/access.log

I want to use a regular expression to get the sns882A part from the string, which is the substring after the third "/", how can I do that?

I am restricted to regex as grok only accepts regex. Is it possible to use regex for this?

Fortuitous answered 23/11, 2012 at 5:20 Comment(0)

Yes you can use regular expression to get what you want via grok:

/[^/]+/[^/]+/(?<field1>[^/]+)/

Photomural answered 22/3, 2014 at 2:42 Comment(1)

I know this answer is way too late, but +1 anyway for being the first correct answer That is, a standalone regex (no other code and no delimiters) that uses named capture for the parts it's supposed to extract. – Marijane 22/3, 2014 at 5:12

for your regex:

    /\w*\/\w*\/(\w*)\/

You can also test with: http://www.regextester.com/

By googling regex tester, you can have different UI.

Kaiserslautern answered 23/11, 2012 at 5:27 Comment(4)

From regextester.com it gives me no match, I tried gskinner.com/RegExr no result there as well... – Fortuitous 23/11, 2012 at 5:34

This solution relies on directory and file names always consisting of alphanumeric characters or underscores. In particular there may be no spaces anywhere in the path – Fortunetelling 23/11, 2012 at 5:39

the match is index 0 based. You can also see: 1: (sns882A), which means its the first match. – Kaiserslautern 23/11, 2012 at 5:53

When using /\w*\/\w*\/(\w*)\/ for grok filter, got grok parse failure error maybe because no match found. – Fortuitous 23/11, 2012 at 6:19

This is how I would do it in Perl:

my ($name) = ($fullname =~ m{^(?:/.*?){2}/(.*?)/});

EDIT: If your framework does not support Perl-ish non-grouping groups (?:xyz), this regex should work instead:

^/.*?/.*?/(.*?)/

If you are concerned about performance of .*?, this works as well:

^/[^/]+/[^/]+/([^/]+)/

One more note: All of regexes above will match string /app/webpf04/sns882A/.

But matching string is completely different from first matching group, which is sns882A in all three cases.

Otherworld answered 23/11, 2012 at 5:29 Comment(6)

When I try ^(?:/.*?){2}/(.*?)/ part on gskinner.com/RegExr, it matched to /app/webpf04/sns882A/ – Fortuitous 23/11, 2012 at 5:39

You should use (?:/[^/]*). Otherwise your regex may take a long time to decide that it doesn't match – Fortunetelling 23/11, 2012 at 5:44

This is exactly why I used .*? - to avoid greedy match, which can be very slow – Otherworld 23/11, 2012 at 5:46

Confirmed when I give ^(?:/.*?){2}/(.*?)/ to grok filter, I got the /app/webpf04/sns882A/ part of the string – Fortuitous 23/11, 2012 at 6:15

Note that matching string is not the same as first matching group. See my amended answer – Otherworld 23/11, 2012 at 6:36

OP didn't ask for Perl – Kalina 17/4, 2017 at 16:25

If you are indeed using Perl then you should use the File::Spec module like this

use strict;
use warnings;

use File::Spec;

my $path = '/app/webpf04/sns882A/snsdomain/logs/access.log';
my @path = File::Spec->splitdir($path);

print $path[3], "\n";

output

sns882A

Fortunetelling answered 23/11, 2012 at 5:35 Comment(1)

I can not use any languages, this is part of the logstash-grok configuration in which I can only supply expressions. – Fortuitous 23/11, 2012 at 5:44

Same answer but a small bug fix. If you doesnt specify ^ in starting,it will go for the next match(try longer paths adding more / for input.). To fix it just add ^ in the starting like this. ^ means starting of the input line. finally group1 is your answer.

^/[^/]+/[^/]+/([^/]+)/

If you are using any URI paths use below.(it will handle path aswell as URI).

^.*?/[^/]+/[^/]+/([^/]+)/

Fere answered 21/1, 2016 at 6:28 Comment(0)

Recommended topics

Hot tags