Regular expression to extract part of a file path using the logstash grok filter
Asked Answered
F

5

5

I am new to regular expressions but I think people here may give me valuable inputs. I am using the logstash grok filter in which I can supply only regular expressions.

I have a string like this

/app/webpf04/sns882A/snsdomain/logs/access.log

I want to use a regular expression to get the sns882A part from the string, which is the substring after the third "/", how can I do that?

I am restricted to regex as grok only accepts regex. Is it possible to use regex for this?

Fortuitous answered 23/11, 2012 at 5:20 Comment(0)
P
6

Yes you can use regular expression to get what you want via grok:

/[^/]+/[^/]+/(?<field1>[^/]+)/
Photomural answered 22/3, 2014 at 2:42 Comment(1)
I know this answer is way too late, but +1 anyway for being the first correct answer That is, a standalone regex (no other code and no delimiters) that uses named capture for the parts it's supposed to extract.Marijane
K
2

for your regex:

    /\w*\/\w*\/(\w*)\/

You can also test with: http://www.regextester.com/

By googling regex tester, you can have different UI.

Kaiserslautern answered 23/11, 2012 at 5:27 Comment(4)
From regextester.com it gives me no match, I tried gskinner.com/RegExr no result there as well...Fortuitous
This solution relies on directory and file names always consisting of alphanumeric characters or underscores. In particular there may be no spaces anywhere in the pathFortunetelling
the match is index 0 based. You can also see: 1: (sns882A), which means its the first match.Kaiserslautern
When using /\w*\/\w*\/(\w*)\/ for grok filter, got grok parse failure error maybe because no match found.Fortuitous
O
0

This is how I would do it in Perl:

my ($name) = ($fullname =~ m{^(?:/.*?){2}/(.*?)/});

EDIT: If your framework does not support Perl-ish non-grouping groups (?:xyz), this regex should work instead:

^/.*?/.*?/(.*?)/

If you are concerned about performance of .*?, this works as well:

^/[^/]+/[^/]+/([^/]+)/

One more note: All of regexes above will match string /app/webpf04/sns882A/.

But matching string is completely different from first matching group, which is sns882A in all three cases.

Otherworld answered 23/11, 2012 at 5:29 Comment(6)
When I try ^(?:/.*?){2}/(.*?)/ part on gskinner.com/RegExr, it matched to /app/webpf04/sns882A/Fortuitous
You should use (?:/[^/]*). Otherwise your regex may take a long time to decide that it doesn't matchFortunetelling
This is exactly why I used .*? - to avoid greedy match, which can be very slowOtherworld
Confirmed when I give ^(?:/.*?){2}/(.*?)/ to grok filter, I got the /app/webpf04/sns882A/ part of the stringFortuitous
Note that matching string is not the same as first matching group. See my amended answerOtherworld
OP didn't ask for PerlKalina
F
0

If you are indeed using Perl then you should use the File::Spec module like this

use strict;
use warnings;

use File::Spec;

my $path = '/app/webpf04/sns882A/snsdomain/logs/access.log';
my @path = File::Spec->splitdir($path);

print $path[3], "\n";

output

sns882A
Fortunetelling answered 23/11, 2012 at 5:35 Comment(1)
I can not use any languages, this is part of the logstash-grok configuration in which I can only supply expressions.Fortuitous
F
0

Same answer but a small bug fix. If you doesnt specify ^ in starting,it will go for the next match(try longer paths adding more / for input.). To fix it just add ^ in the starting like this. ^ means starting of the input line. finally group1 is your answer.

^/[^/]+/[^/]+/([^/]+)/

If you are using any URI paths use below.(it will handle path aswell as URI).

^.*?/[^/]+/[^/]+/([^/]+)/
Fere answered 21/1, 2016 at 6:28 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.