Regex to validate JSON
Asked Answered
E

12

117

I am looking for a Regex that allows me to validate json.

I am very new to Regex's and i know enough that parsing with Regex is bad but can it be used to validate?

Emancipation answered 6/4, 2010 at 8:17 Comment(10)
Why bother with a separate validation step? Most languages have JSON-libraries that can parse JSON, and if it can parse it, it was valid. If not, the library will tell you.Deka
You need to parse text in order to validate it...Aerometer
@mario - What's the point of the bounty here? Are you looking for more answers, or just some attention to your cause? :)Hydrated
@Kobi: It's primarily normal bounty attention whoring :> I hope to outcompete the invalid accepted answer at least. Also less nefarious: getting some community review without needing a separate question. And maybe someone can simplify it further, or convert it into a compacter (?R) version.Spire
@Spire - I don't know... I'm all for abusing regex, and extremely sympathetic to your objection to the "regex must match regular" fallacy - but not on practical, work related questions. The best answer here is really Epcylon's comment... (maybe this discussion belongs in the chat?)Hydrated
@Kobi. Well, my answer is just a by-product of a benchmarking craze (lost my bet). And in this question context it's more of a can-it-be-done? topic. I have one actual use case nevertheless. I'm going to prepend the verification on PHPs json_decode, which despite the simplicity of JSON had around a dozen exploitabilities. Old PHP versions are still awfully widespread, so I'm using it as security addon.Spire
Another practical use case is finding JSON expressions within a larger string. If you simply want to ask "is this string here a JSON object", then yes, a JSON parsing library is probably a better tool. But it can't find JSON objects within a larger structure for you.Vizier
@Deka that is sadly not true - because most json parser parse strings and eliminate duplicated nodes, which makes it a valid json, but doesnt tell you if it was in the first placeTruncated
This isn't an answer, but you can use this part of Crockford's JSON-js library. It uses 4 regexes and combines them in a clever way.Huntsman
It does not match "\/" as a valid json string but it is a valid json string value. can you fix this?. for example an escaped url such as "https:\/\/websit.com" will not be matched by your string group.Trovillion
S
210

Yes, a complete regex validation is possible.

Some modern regex implementations allow for recursive regular expressions, which can verify a complete JSON serialized structure. The json.org specification makes it quite straightforward.

$pcre_regex = '/
    (?(DEFINE)
        (?<ws>      [\t\n\r ]* )
        (?<number>  -? (?: 0|[1-9]\d*) (?: \.\d+)? (?: [Ee] [+-]? \d++)? )    
        (?<boolean> true | false | null )
        (?<string>  " (?: [^\\\\"\x00-\x1f] | \\\\ ["\\\\bfnrt\/] | \\\\ u [0-9A-Fa-f]{4} )* " )
        (?<pair>    (?&ws) (?&string) (?&ws) : (?&value) )
        (?<array>   \[ (?: (?&value) (?: , (?&value) )* )? (?&ws) \] )
        (?<object>  \{ (?: (?&pair) (?: , (?&pair) )* )? (?&ws) \} )
        (?<value>   (?&ws) (?: (?&number) | (?&boolean) | (?&string) | (?&array) | (?&object) ) (?&ws) )
    )
    \A (?&value) \Z
    /sx';

The example above uses the Perl 5.10/PCRE2 subroutine call syntax to simplify the expression and improve readability. It works quite well in PHP with the PCRE functions. Should work almost unmodified in Perl (provided one replaces 4-backslash sequences '\\\\' with 2-backslash sequences '\\' in the <string> subroutine); and can be adapted for other languages (e.g. Ruby, or those for which PCRE bindings are available).

This regex passes all tests from the JSON.org test suite (see link at the end of the page) as well as those from Nicolas Seriot's JSON Parser test suite.1

Simpler RFC4627 verification

A simpler approach is the minimal consistency check as specified in RFC4627, section 6. It's however just intended as security test and basic non-validity precaution:

var jsonCode = /* untrusted input */;

var jsonObject = !(/[^,:{}\[\]0-9.\-+Eaeflnr-u \n\r\t]/.test(
    jsonCode.replace(/"(\\.|[^"\\])*"/g, '')))
    && eval('(' + jsonCode + ')');

1 With the exception of two cases whose input is very large, causing the regex to time out. More generally, this approach is bound to fail on inputs large enough to hit the resource limits of the matching engine (either in time or space).

Spire answered 2/10, 2010 at 13:4 Comment(23)
+1 There is so much bad in the world from people who just don't get the regex syntax and misuse that as a reason to hate them :(Jordanna
@mario, not sure if you think I am in the the-naysayers-department, but I'm not. Note that your statement "Most modern regex implementations allow for recursive regexpressions" is highly debatable. AFAIK, only Perl, PHP and .NET have the capability to define recursive patterns. I wouldn't call that "most".Soak
@Bart: Yes, that's rightly debatable. Most ironically the Javascript regex engines cannot use such a recursive regex to verify JSON (or only with elaborate workarounds). So if regex == posix regex, it's not an option. It's nevertheless interesting that it's doable with the contemporary implementations; even with few practical use cases. (But true, libpcre is not the prevalent engine everywhere.) -- Also for the record: I was hoping for a synthetic reversal badge, but your not getting a few bandwagon upvotes impedes that. :/Spire
Java, Python, JavaScript, Ruby all do not support recursive patterns, to name a few popular languages. So your "Most modern regex implementations" isn't just debatable, it's wrong. And mimicking a fixed number of nesting with look-arounds isn't really recursive, if that's what you meant by "elaborate workarounds". But now I get it, by attaching a bounty you're hoping my answer gets enough down-votes and yours enough up-votes just for a badge? I'm sorry to say, I pity you. I recommend you down-vote my answer as well in order to get your precious badge (if you haven't done so already).Soak
Nope. I was after the Populist badge, for which I require 20 votes but still 10 votes on your answer. So on the contrary the downvotes on your question are not to my benefit for that.Spire
Using \d is dangerous. In many regexp implementations \d matches the Unicode definition of a digit that is not just [0-9] but instead includes alternates scripts.Reminisce
Well, looking further, this regexp has many other issues. It matches JSON data, but some non-JSON data matches too. For example, the single literal false matches while the top level JSON value must be either an array or an object. It has also many issues in character set allowed in strings or in spaces.Reminisce
@dolmen: True. The JSON RFC makes only array and objects explicit for the outer shell. I was looking at this from a PHP json_decode standpoint, where the three literal tokens, strings or numbers are also accepted. And obviously I did not care about the string validity; that would require at least the /u flag and some further constraints in [^"\\\\]*. As for \d that depends on the locale and PCRE version obviously.Spire
Related for the thematic, also mostly theoretical but regex feature comparison value: JSON parser as a single Perl Regex demonstrates how Perls regex code callbacks (?{..}?) can build an actual JSON parse tree, not just validate it.Spire
Is there a C# version of this?Sealy
This regex actually does not pass 3 test cases from test suite with invalid files from json.org/JSON_checker. (fail1.json, fail25.json, fail27.json). Originally fail18.json was not passed too, but there where an error there.Dwell
@GinoPane That's what →dolmen already noted. This regex was modeled after PHPs implementation - which accepts atoms like true and false or a "plain string" instead of an object/array as outer shell. Moreover it's a bit more JSOL than JSON, as it allows unescaped linebreaks/tabs.Spire
@mario, not exactly by now , cause according to RFC-7159 it would be valid JSON strings. Real problem was only with fail25.json, fail27.json, but I've fixed them.Dwell
The Regex also works for json with duplicated nodes on the same level - which in json is wrong there can not be 2 "Head" Nodes on Top Level for exampleTruncated
The suggested regex fails when the JSON includes escape sequences, e.g. {"libelle":"Cin\u00e9ma Gaumont Amiens"}. regex101.com/r/kkMbN4/1Toothwort
@Gajus: It fails because you copied the literal 4 backslashes in \\\\ u [0-9a-f]+ over. For regex-only context, it's just 2 backslashes however.Spire
To use in PHP, add trim() to the pattern or it will be error unknow modifier... preg_match(trim($pcre_regex), 'json string here');.Halmahera
this doesn't seem reliable to me: 3v4l.org/DpiAdArciniega
["FABRICATION",[], This input will cause catastrophic backtracking error. snippt:regex101.com/r/Jj0bRX/1 There is a problem with the array partTrovillion
@DominikLemberger Duplicated property names are perfectly legal in JSON. From the spec: "The JSON syntax does not impose any restrictions on the strings used as names, does not require that name strings be unique, and does not assign any significance to the ordering of name/value pairs. These are all semantic considerations that may be defined by JSON processors or in specifications defining specific uses of JSON for data interchange."Theatricalize
@Trovillion the problem lies not with the array part, but with the string part. Removing the repetition at the end of the first alternative in the <string> subroutine (or making it possessive) fixes it.Graduate
@Arciniega Good catch! I think the problem may be due to the regex engine either timing out or going out of memory because of the large input. The regex can be optimized by making every repetition possessive and using atomic groups where appropriate: it then passes your test consistently for all PHP versions >=5.3.29 3v4l.org/nbgFA – Please note that when inputs are large enough even the optimized expression fails (try changing the size of the array from 1000 to 100000).Graduate
@Graduate good job, and yeah preg_last_error() report a PREG_JIT_STACKLIMIT_ERROR: 3v4l.org/Kf8TMArciniega
H
36

Yes, it's a common misconception that Regular Expressions can match only regular languages. In fact, the PCRE functions can match much more than regular languages, they can match even some non-context-free languages! Wikipedia's article on RegExps has a special section about it.

JSON can be recognized using PCRE in several ways! @mario showed one great solution using named subpatterns and back-references. Then he noted that there should be a solution using recursive patterns (?R). Here is an example of such regexp written in PHP:

$regexString = '"([^"\\\\]*|\\\\["\\\\bfnrt\/]|\\\\u[0-9a-f]{4})*"';
$regexNumber = '-?(?=[1-9]|0(?!\d))\d+(\.\d+)?([eE][+-]?\d+)?';
$regexBoolean= 'true|false|null'; // these are actually copied from Mario's answer
$regex = '/\A('.$regexString.'|'.$regexNumber.'|'.$regexBoolean.'|';    //string, number, boolean
$regex.= '\[(?:(?1)(?:,(?1))*)?\s*\]|'; //arrays
$regex.= '\{(?:\s*'.$regexString.'\s*:(?1)(?:,\s*'.$regexString.'\s*:(?1))*)?\s*\}';    //objects
$regex.= ')\Z/is';

I'm using (?1) instead of (?R) because the latter references the entire pattern, but we have \A and \Z sequences that should not be used inside subpatterns. (?1) references to the regexp marked by the outermost parentheses (this is why the outermost ( ) does not start with ?:). So, the RegExp becomes 268 characters long :)

/\A("([^"\\]*|\\["\\bfnrt\/]|\\u[0-9a-f]{4})*"|-?(?=[1-9]|0(?!\d))\d+(\.\d+)?([eE][+-]?\d+)?|true|false|null|\[(?:(?1)(?:,(?1))*)?\s*\]|\{(?:\s*"([^"\\]*|\\["\\bfnrt\/]|\\u[0-9a-f]{4})*"\s*:(?1)(?:,\s*"([^"\\]*|\\["\\bfnrt\/]|\\u[0-9a-f]{4})*"\s*:(?1))*)?\s*\})\Z/is

Anyway, this should be treated as a "technology demonstration", not as a practical solution. In PHP I'll validate the JSON string with calling the json_decode() function (just like @Epcylon noted). If I'm going to use that JSON (if it's validated), then this is the best method.

Handmaid answered 6/6, 2011 at 8:25 Comment(6)
Using \d is dangerous. In many regexp implementations \d matches the Unicode definition of a digit that is not just [0-9] but instead includes alternates scripts.Reminisce
@dolmen: you may be right, but you shouldn't edit that yourself into the question. Just adding it as a comment should suffice.Piercy
I think \d does not match unicode numbers in PHP's implementation of PCRE. For example ٩ symbol (0x669 arabic-indic digit nine) will be matched using pattern #\p{Nd}#u but not #\d#uHandmaid
@hrant-khachatrian: it does not because you did not use the /u flag. JSON is encoded in UTF-8. For a proper regexp you should use that flag.Reminisce
Besides that, as this implementation is based on @mario's, it repeats the same flaws: at the top level only arrays and object are allowed. Not string, number, boolean or null. Fixing this requires a major refactoring.Reminisce
@Reminisce I did use the u modifier, please look again at the patterns in my previous comment :) Strings, numbers and booleans ARE correctly matched at the top level. You can paste the long regexp here quanetic.com/Regex and try yourselfHandmaid
S
16

Because of the recursive nature of JSON (nested {...}-s), regex is not suited to validate it. Sure, some regex flavours can recursively match patterns* (and can therefor match JSON), but the resulting patterns are horrible to look at, and should never ever be used in production code IMO!

* Beware though, many regex implementations do not support recursive patterns. Of the popular programming languages, these support recursive patterns: Perl, .NET, PHP and Ruby 1.9.2

Soak answered 6/4, 2010 at 8:21 Comment(3)
Humorously relevant related question...Blanca
@all down voters: "regex is not suited to validate it" does not mean certain regex engines can't do it (at least, that is what I meant). Sure, some regex implementations can, but anyone in their right mind would simply use a JSON parser. Just like if someone asks how to build a complete house with only a hammer, I'd answer that a hammer isn't suited for the job, you'd need a complete toolkit and machinery. Sure, someone with enough endurance can do it with just the hammer.Soak
This may be a valid warning, but it does not answer the question. Regex may not be the correct tool, but some people don't have a choice. We're locked into a vendor product that evaluates the output of a service to check its health, and the only option the vendor provides for custom health checking is a web form that accepts a regex. The vendor product that evaluates the service status is not under my team's control. For us, evaluating JSON with regex is now a requirement, therefore, an answer of "unsuitable" is not viable. (I still didn't downvote you.)Accelerando
E
14

Looking at the documentation for JSON, it seems that the regex can simply be three parts if the goal is just to check for fitness:

  • [First] The string starts and ends with either [] or {}

    • [{\[]{1}...[}\]]{1}
  • AND EITHER

    • [Second] The character is an allowed JSON control character (just one)

      • ...[,:{}\[\]0-9.\-+Eaeflnr-u \n\r\t]...
    • [Third] The set of characters contained in a ""

      • ...".*?"...

All together: [{\[]{1}([,:{}\[\]0-9.\-+Eaeflnr-u \n\r\t]|".*?")+[}\]]{1}

If the JSON string contains newline characters, then you should use the singleline switch on your regex flavor so that . matches newline. Please note that this will not fail on all bad JSON, but it will fail if the basic JSON structure is invalid, which is a straight-forward way to do a basic sanity validation before passing it to a parser.

Enyedy answered 18/7, 2017 at 13:2 Comment(5)
The suggested regex has awful backtracking behavior on certain testcases. If you try running it on '{"a":false, "b":true,"c":100,"' this incomplete json, it halts. Example: regex101.com/r/Zzc6sz. A simple fix would be: [{[]{1}([,:{}[]0-9.\-+Eaeflnr-u \n\r\t]|".*?")+[}]]{1}Strung
@Strung I've updated to reflect your comment. Thanks!Enyedy
This slightly modified version of @Enyedy works perfect for my use case of finding all JSON like structures in text (globally applied to a HTML file in my case): [{\[]{1}([,:{}\[\]0-9.\-+A-zr-u \n\r\t]|".*:?")+[}\]]{1}Waggish
In my environment, and at regexr, this is matching against {{"parentRelationField": "Project_Name__c", "employeeIdField": "Employee_Name__c"} - did you find a way to prevent it matching when the open and close braces are not matching in count?Coniine
@ShaneK, for something like that, you're better off with one of the other more complex solutions or using a simple function to count {}.Enyedy
D
13

I tried @mario's answer, but it didn't work for me, because I've downloaded test suite from JSON.org (archive) and there were 4 failed tests (fail1.json, fail18.json, fail25.json, fail27.json).

I've investigated the errors and found out, that fail1.json is actually correct (according to manual's note and RFC-7159 valid string is also a valid JSON). File fail18.json was not the case either, cause it contains actually correct deeply-nested JSON:

[[[[[[[[[[[[[[[[[[[["Too deep"]]]]]]]]]]]]]]]]]]]]

So two files left: fail25.json and fail27.json:

["  tab character   in  string  "]

and

["line
break"]

Both contains invalid characters. So I've updated the pattern like this (string subpattern updated):

$pcreRegex = '/
          (?(DEFINE)
             (?<number>   -? (?= [1-9]|0(?!\d) ) \d+ (\.\d+)? ([eE] [+-]? \d+)? )
             (?<boolean>   true | false | null )
             (?<string>    " ([^"\n\r\t\\\\]* | \\\\ ["\\\\bfnrt\/] | \\\\ u [0-9a-f]{4} )* " )
             (?<array>     \[  (?:  (?&json)  (?: , (?&json)  )*  )?  \s* \] )
             (?<pair>      \s* (?&string) \s* : (?&json)  )
             (?<object>    \{  (?:  (?&pair)  (?: , (?&pair)  )*  )?  \s* \} )
             (?<json>   \s* (?: (?&number) | (?&boolean) | (?&string) | (?&array) | (?&object) ) \s* )
          )
          \A (?&json) \Z
          /six';

So now all legal tests from json.org can be passed.

Dwell answered 25/7, 2016 at 11:34 Comment(2)
This will match just JSON values(strings, booleans, and numbers) as well, which is not a JSON object/array.Phosphate
It does not match "\/" as a valid json string but it is a valid json string value. can you fix this?. for example an escaped url such as "https:\/\/websit.com" will not be matched by your string group.Trovillion
U
3

I created a Ruby implementation of Mario's solution, which does work:

# encoding: utf-8

module Constants
  JSON_VALIDATOR_RE = /(
         # define subtypes and build up the json syntax, BNF-grammar-style
         # The {0} is a hack to simply define them as named groups here but not match on them yet
         # I added some atomic grouping to prevent catastrophic backtracking on invalid inputs
         (?<number>  -?(?=[1-9]|0(?!\d))\d+(\.\d+)?([eE][+-]?\d+)?){0}
         (?<boolean> true | false | null ){0}
         (?<string>  " (?>[^"\\\\]* | \\\\ ["\\\\bfnrt\/] | \\\\ u [0-9a-f]{4} )* " ){0}
         (?<array>   \[ (?> \g<json> (?: , \g<json> )* )? \s* \] ){0}
         (?<pair>    \s* \g<string> \s* : \g<json> ){0}
         (?<object>  \{ (?> \g<pair> (?: , \g<pair> )* )? \s* \} ){0}
         (?<json>    \s* (?> \g<number> | \g<boolean> | \g<string> | \g<array> | \g<object> ) \s* ){0}
       )
    \A \g<json> \Z
    /uix
end

########## inline test running
if __FILE__==$PROGRAM_NAME

  # support
  class String
    def unindent
      gsub(/^#{scan(/^(?!\n)\s*/).min_by{|l|l.length}}/u, "")
    end
  end

  require 'test/unit' unless defined? Test::Unit
  class JsonValidationTest < Test::Unit::TestCase
    include Constants

    def setup

    end

    def test_json_validator_simple_string
      assert_not_nil %s[ {"somedata": 5 }].match(JSON_VALIDATOR_RE)
    end

    def test_json_validator_deep_string
      long_json = <<-JSON.unindent
      {
          "glossary": {
              "title": "example glossary",
          "GlossDiv": {
                  "id": 1918723,
                  "boolean": true,
                  "title": "S",
            "GlossList": {
                      "GlossEntry": {
                          "ID": "SGML",
                "SortAs": "SGML",
                "GlossTerm": "Standard Generalized Markup Language",
                "Acronym": "SGML",
                "Abbrev": "ISO 8879:1986",
                "GlossDef": {
                              "para": "A meta-markup language, used to create markup languages such as DocBook.",
                  "GlossSeeAlso": ["GML", "XML"]
                          },
                "GlossSee": "markup"
                      }
                  }
              }
          }
      }
      JSON

      assert_not_nil long_json.match(JSON_VALIDATOR_RE)
    end

  end
end
Ulcerative answered 23/5, 2012 at 20:20 Comment(4)
Using \d is dangerous. In many regexp implementations \d matches the Unicode definition of a digit that is not just [0-9] but instead includes alternates scripts. So unless Unicode support in Ruby is still broken, you have to fix the regexp in your code.Reminisce
As far as I know, Ruby uses PCRE in which \d does not match ALL unicode definitions of "digit." Or are you saying that it should?Ulcerative
Except that it does not. False positive: "\x00", [True]. False negative: "\u0000", "\n". Hangs on: "[{"":[{"":[{"":" (repeated 1000x).Petrie
Not too hard to add as test cases and then tweak the code to pass. How to get it not to blow the stack with a depth of 1000+ is an entirely different matter, though...Ulcerative
F
1

For "strings and numbers", I think that the partial regular expression for numbers:

-?(?:0|[1-9]\d*)(?:\.\d+)(?:[eE][+-]\d+)?

should be instead:

-?(?:0|[1-9]\d*)(?:\.\d+)?(?:[eE][+\-]?\d+)?

since the decimal part of the number is optional, and also it is probably safer to escape the - symbol in [+-] since it has a special meaning between brackets

Factotum answered 3/11, 2010 at 14:46 Comment(2)
Using \d is dangerous. In many regexp implementations \d matches the Unicode definition of a digit that is not just [0-9] but instead includes alternates scripts.Reminisce
It looks a bit strange, that -0 is a valid number but RFC 4627 allows it and your regular expression conforms to it.Cannelloni
S
1

A trailing comma in a JSON array caused my Perl 5.16 to hang, possibly because it kept backtracking. I had to add a backtrack-terminating directive:

(?<json>   \s* (?: (?&number) | (?&boolean) | (?&string) | (?&array) | (?&object) )(*PRUNE) \s* )
                                                                                   ^^^^^^^^

This way, once it identifies a construct that is not 'optional' (* or ?), it shouldn't try backtracking over it to try to identify it as something else.

Sadye answered 14/9, 2012 at 2:17 Comment(0)
M
1

Regex that validate simple JSON not JSONArray

it validate key(string):value(string,integer,[{key:value},{key:value}],{key:value})

^\{(\s|\n\s)*(("\w*"):(\s)*("\w*"|\d*|(\{(\s|\n\s)*(("\w*"):(\s)*("\w*(,\w+)*"|\d{1,}|\[(\s|\n\s)*(\{(\s|\n\s)*(("\w*"):(\s)*(("\w*"|\d{1,}))((,(\s|\n\s)*"\w*"):(\s)*("\w*"|\d{1,}))*(\s|\n)*\})){1}(\s|\n\s)*(,(\s|\n\s)*\{(\s|\n\s)*(("\w*"):(\s)*(("\w*"|\d{1,}))((,(\s|\n\s)*"\w*"):(\s)*("\w*"|\d{1,}))*(\s|\n)*\})?)*(\s|\n\s)*\]))((,(\s|\n\s)*"\w*"):(\s)*("\w*(,\w+)*"|\d{1,}|\[(\s|\n\s)*(\{(\s|\n\s)*(("\w*"):(\s)*(("\w*"|\d{1,}))((,(\s|\n\s)*"\w*"):(\s)*("\w*"|\d{1,}))*(\s|\n)*\})){1}(\s|\n\s)*(,(\s|\n\s)*\{(\s|\n\s)*(("\w*"):(\s)*(("\w*"|\d{1,}))((,(\s|\n\s)*"\w*"):("\w*"|\d{1,}))*(\s|\n)*\})?)*(\s|\n\s)*\]))*(\s|\n\s)*\}){1}))((,(\s|\n\s)*"\w*"):(\s)*("\w*"|\d*|(\{(\s|\n\s)*(("\w*"):(\s)*("\w*(,\w+)*"|\d{1,}|\[(\s|\n\s)*(\{(\s|\n\s)*(("\w*"):(\s)*(("\w*"|\d{1,}))((,(\s|\n\s)*"\w*"):(\s)*("\w*"|\d{1,}))*(\s|\n)*\})){1}(\s|\n\s)*(,(\s|\n\s)*\{(\s|\n\s)*(("\w*"):(\s)*(("\w*"|\d{1,}))((,(\s|\n\s)*"\w*"):(\s)*("\w*"|\d{1,}))*(\s|\n)*\})?)*(\s|\n\s)*\]))((,(\s|\n\s)*"\w*"):(\s)*("\w*(,\w+)*"|\d{1,}|\[(\s|\n\s)*(\{(\s|\n\s)*(("\w*"):(\s)*(("\w*"|\d{1,}))((,(\s|\n\s)*"\w*"):(\s)*("\w*"|\d{1,}))*(\s|\n)*\})){1}(\s|\n\s)*(,(\s|\n\s)*\{(\s|\n\s)*(("\w*"):(\s)*(("\w*"|\d{1,}))((,(\s|\n\s)*"\w*"):("\w*"|\d{1,}))*(\s|\n)*\})?)*(\s|\n\s)*\]))*(\s|\n\s)*\}){1}))*(\s|\n)*\}$

sample data that validate by this JSON

{
"key":"string",
"key": 56,
"key":{
        "attr":"integer",
        "attr": 12
        },
"key":{
        "key":[
            {
                "attr": 4,
                "attr": "string"
            }
        ]
     }
}
Mak answered 1/5, 2020 at 5:59 Comment(0)
E
0

As was written above, if the language you use has a JSON-library coming with it, use it to try decoding the string and catch the exception/error if it fails! If the language does not (just had such a case with FreeMarker) the following regex could at least provide some very basic validation (it's written for PHP/PCRE to be testable/usable for more users). It's not as foolproof as the accepted solution, but also not that scary =):

~^\{\s*\".*\}$|^\[\n?\{\s*\".*\}\n?\]$~s

short explanation:

// we have two possibilities in case the string is JSON
// 1. the string passed is "just" a JSON object, e.g. {"item": [], "anotheritem": "content"}
// this can be matched by the following regex which makes sure there is at least a {" at the
// beginning of the string and a } at the end of the string, whatever is inbetween is not checked!

^\{\s*\".*\}$

// OR (character "|" in the regex pattern)
// 2. the string passed is a JSON array, e.g. [{"item": "value"}, {"item": "value"}]
// which would be matched by the second part of the pattern above

^\[\n?\{\s*\".*\}\n?\]$

// the s modifier is used to make "." also match newline characters (can happen in prettyfied JSON)

if I missed something that would break this unintentionally, I'm grateful for comments!

Electromotive answered 7/1, 2015 at 10:12 Comment(0)
I
-2

Here my regexp for validate string:

^\"([^\"\\]*|\\(["\\\/bfnrt]{1}|u[a-f0-9]{4}))*\"$

Was written usign original syntax diagramm.

Impious answered 23/7, 2013 at 7:7 Comment(1)
It is an invalid regexRadian
P
-4

I realize that this is from over 6 years ago. However, I think there is a solution that nobody here has mentioned that is way easier than regexing

function isAJSON(string) {
    try {
        JSON.parse(string)  
    } catch(e) {
        if(e instanceof SyntaxError) return false;
    };  
    return true;
}
Pauwles answered 27/6, 2016 at 22:18 Comment(1)
The question was not about JavaScript.Epigone

© 2022 - 2024 — McMap. All rights reserved.