Indexing and mapping log data using solr 6
Asked Answered
E

1

7

Currently I am using solr 6 and I want to index log data like those shown below:

2016-06-22T03:00:04Z|INFO|ip-10-11-0-241|1301|DreamRocket.Game.ServiceInterface.GameCredentialsAuthProvider|DreamRocket.Game.ServiceInterface.GameCredentialsAuthProvider.CheckValidGameDataRequestFilter|Invalid UserAgent=%E3%83%94%E3%82%B3/1.07.41149 CFNetwork/758.2.8 Darwin/15.0.0, PlayerId=player_a2a7d1a4-0a31-4c4d-b5bf-10be67dc85d6|

I am unsure how to separate the data via pipe. the layout I use in Nlog is this.

${date:universalTime=True:format=yyyy-MM-ddTHH\:mm\:ssZ}|${level:uppercase=true}|${machinename}|${processid}|${logger}|${callsite:className=true:methodName=true}|${message}|${exception:format=tostring}${newline}

And I tried to use CSV upload but solr gives me the below json return. Not conductive to do queries. Please help

  "responseHeader":{
    "status":0,
    "QTime":77,
    "params":{
      "q":"*:*",
      "indent":"on",
      "wt":"json",
      "_":"1466745065000"}},
  "response":{"numFound":8,"start":0,"docs":[
      {
        "id":"b28049bb-d49e-4b4d-80db-d7d77351527b",
        "2016-06-23T02_37_18Z_INFO_web.chubi.development1_6326_DreamRocket.Game.ServiceInterface.GameCredentialsAuthProvider_DreamRocket.Game.ServiceInterface.GameCredentialsAuthProvider.CheckValidGameDataRequestFilter_Invalid_UserAgent_PIKO_0.00.41269_CFNetwork_711.5.6_Darwin_14.0.0":["2016-06-23T02:37:28Z|INFO|web.chubi.development1|6326|DreamRocket.Game.ServiceInterface.GameCredentialsAuthProvider|DreamRocket.Game.ServiceInterface.GameCredentialsAuthProvider.CheckValidGameDataRequestFilter|Invalid UserAgent=PIKO/0.00.41269 CFNetwork/711.5.6 Darwin/14.0.0"],
        "_PlayerId_player_407defcf-7032-4ef4-81a6-91bb62b9150b_":[" PlayerId=player_905266b2-9ce3-4fa1-b0a7-4663b9509731|"],
        "_version_":1537919142165741568}]}
Elsieelsinore answered 22/6, 2016 at 3:3 Comment(0)
B
2

Looks like you want to extract Clean data out of the logs that can be indexed and searched without any ambiguity. Why don't you try to analyze your data using creating a custom Analyzer that uses a Regex for filtering out the data for you. I would strongly suggest solr.PatternTokenizerFactory to remove pipe character from your Text . Also , you can use Analysis tab in solr for an exhaustive analysis that how your log data has been treated by Analyzer . For the encoded text, like in Invalid UserAgent field you can use ASCII Folding filter factory for indexing encoded characters . And you may need to tokenize data at dots also, i don't know whether that's your requirement or not . In your data, PatternTokenizer does the trick, and if you still need to do further refinements , you may use solr.WordDelimeter to tune your index better . May be I'll edit this solution with some Analyzer settings for you :)

Blend answered 27/6, 2016 at 19:37 Comment(2)
i think i found a better solution. I get my nlog to send logs via json format. Better this wayElsieelsinore
You mean to say format it before sending only and put into a JSON right ? Yeah that would be great .Blend

© 2022 - 2024 — McMap. All rights reserved.