Regex to validate JSON

Asked 6/4, 2010 at 8:17 Answered 1/5, 2020 at 5:59

117

I am looking for a Regex that allows me to validate json.

I am very new to Regex's and i know enough that parsing with Regex is bad but can it be used to validate?

Emancipation answered 6/4, 2010 at 8:17 Comment(10)

Why bother with a separate validation step? Most languages have JSON-libraries that can parse JSON, and if it can parse it, it was valid. If not, the library will tell you. – Deka 2/10, 2010 at 13:18

You need to parse text in order to validate it... – Aerometer 2/1, 2011 at 6:1

@mario - What's the point of the bounty here? Are you looking for more answers, or just some attention to your cause? :) – Hydrated 4/6, 2011 at 8:29

@Kobi: It's primarily normal bounty attention whoring :> I hope to outcompete the invalid accepted answer at least. Also less nefarious: getting some community review without needing a separate question. And maybe someone can simplify it further, or convert it into a compacter (?R) version. – Spire 4/6, 2011 at 10:51

@Spire - I don't know... I'm all for abusing regex, and extremely sympathetic to your objection to the "regex must match regular" fallacy - but not on practical, work related questions. The best answer here is really Epcylon's comment... (maybe this discussion belongs in the chat?) – Hydrated 4/6, 2011 at 13:14

@Kobi. Well, my answer is just a by-product of a benchmarking craze (lost my bet). And in this question context it's more of a can-it-be-done? topic. I have one actual use case nevertheless. I'm going to prepend the verification on PHPs json_decode, which despite the simplicity of JSON had around a dozen exploitabilities. Old PHP versions are still awfully widespread, so I'm using it as security addon. – Spire 4/6, 2011 at 13:46

Another practical use case is finding JSON expressions within a larger string. If you simply want to ask "is this string here a JSON object", then yes, a JSON parsing library is probably a better tool. But it can't find JSON objects within a larger structure for you. – Vizier 23/12, 2014 at 17:32

@Deka that is sadly not true - because most json parser parse strings and eliminate duplicated nodes, which makes it a valid json, but doesnt tell you if it was in the first place – Truncated 23/3, 2017 at 8:55

This isn't an answer, but you can use this part of Crockford's JSON-js library. It uses 4 regexes and combines them in a clever way. – Huntsman 10/10, 2019 at 7:3

It does not match "\/" as a valid json string but it is a valid json string value. can you fix this?. for example an escaped url such as "https:\/\/websit.com" will not be matched by your string group. – Trovillion 4/2, 2022 at 11:23

210

Yes, a complete regex validation is possible.

Some modern regex implementations allow for recursive regular expressions, which can verify a complete JSON serialized structure. The json.org specification makes it quite straightforward.

$pcre_regex = '/
    (?(DEFINE)
        (?<ws>      [\t\n\r ]* )
        (?<number>  -? (?: 0|[1-9]\d*) (?: \.\d+)? (?: [Ee] [+-]? \d++)? )    
        (?<boolean> true | false | null )
        (?<string>  " (?: [^\\\\"\x00-\x1f] | \\\\ ["\\\\bfnrt\/] | \\\\ u [0-9A-Fa-f]{4} )* " )
        (?<pair>    (?&ws) (?&string) (?&ws) : (?&value) )
        (?<array>   \[ (?: (?&value) (?: , (?&value) )* )? (?&ws) \] )
        (?<object>  \{ (?: (?&pair) (?: , (?&pair) )* )? (?&ws) \} )
        (?<value>   (?&ws) (?: (?&number) | (?&boolean) | (?&string) | (?&array) | (?&object) ) (?&ws) )
    )
    \A (?&value) \Z
    /sx';

The example above uses the Perl 5.10/PCRE2 subroutine call syntax to simplify the expression and improve readability. It works quite well in PHP with the PCRE functions. Should work almost unmodified in Perl (provided one replaces 4-backslash sequences '\\\\' with 2-backslash sequences '\\' in the <string> subroutine); and can be adapted for other languages (e.g. Ruby, or those for which PCRE bindings are available).

This regex passes all tests from the JSON.org test suite (see link at the end of the page) as well as those from Nicolas Seriot's JSON Parser test suite.¹

Simpler RFC4627 verification

A simpler approach is the minimal consistency check as specified in RFC4627, section 6. It's however just intended as security test and basic non-validity precaution:

var jsonCode = /* untrusted input */;

var jsonObject = !(/[^,:{}\[\]0-9.\-+Eaeflnr-u \n\r\t]/.test(
    jsonCode.replace(/"(\\.|[^"\\])*"/g, '')))
    && eval('(' + jsonCode + ')');

¹ With the exception of two cases whose input is very large, causing the regex to time out. More generally, this approach is bound to fail on inputs large enough to hit the resource limits of the matching engine (either in time or space).

Spire answered 2/10, 2010 at 13:4 Comment(23)

+1 There is so much bad in the world from people who just don't get the regex syntax and misuse that as a reason to hate them :( – Jordanna 5/6, 2011 at 15:43

@mario, not sure if you think I am in the the-naysayers-department, but I'm not. Note that your statement "Most modern regex implementations allow for recursive regexpressions" is highly debatable. AFAIK, only Perl, PHP and .NET have the capability to define recursive patterns. I wouldn't call that "most". – Soak 6/6, 2011 at 20:49

@Bart: Yes, that's rightly debatable. Most ironically the Javascript regex engines cannot use such a recursive regex to verify JSON (or only with elaborate workarounds). So if regex == posix regex, it's not an option. It's nevertheless interesting that it's doable with the contemporary implementations; even with few practical use cases. (But true, libpcre is not the prevalent engine everywhere.) -- Also for the record: I was hoping for a synthetic reversal badge, but your not getting a few bandwagon upvotes impedes that. :/ – Spire 6/6, 2011 at 21:2

Java, Python, JavaScript, Ruby all do not support recursive patterns, to name a few popular languages. So your "Most modern regex implementations" isn't just debatable, it's wrong. And mimicking a fixed number of nesting with look-arounds isn't really recursive, if that's what you meant by "elaborate workarounds". But now I get it, by attaching a bounty you're hoping my answer gets enough down-votes and yours enough up-votes just for a badge? I'm sorry to say, I pity you. I recommend you down-vote my answer as well in order to get your precious badge (if you haven't done so already). – Soak 6/6, 2011 at 21:13

Nope. I was after the Populist badge, for which I require 20 votes but still 10 votes on your answer. So on the contrary the downvotes on your question are not to my benefit for that. – Spire 7/6, 2011 at 14:21

Using \d is dangerous. In many regexp implementations \d matches the Unicode definition of a digit that is not just [0-9] but instead includes alternates scripts. – Reminisce 10/1, 2013 at 8:51

Well, looking further, this regexp has many other issues. It matches JSON data, but some non-JSON data matches too. For example, the single literal false matches while the top level JSON value must be either an array or an object. It has also many issues in character set allowed in strings or in spaces. – Reminisce 10/1, 2013 at 11:3

@dolmen: True. The JSON RFC makes only array and objects explicit for the outer shell. I was looking at this from a PHP json_decode standpoint, where the three literal tokens, strings or numbers are also accepted. And obviously I did not care about the string validity; that would require at least the /u flag and some further constraints in [^"\\\\]*. As for \d that depends on the locale and PCRE version obviously. – Spire 10/1, 2013 at 18:29

Related for the thematic, also mostly theoretical but regex feature comparison value: JSON parser as a single Perl Regex demonstrates how Perls regex code callbacks (?{..}?) can build an actual JSON parse tree, not just validate it. – Spire 19/10, 2013 at 0:38

Is there a C# version of this? – Sealy 12/1, 2016 at 11:44

This regex actually does not pass 3 test cases from test suite with invalid files from json.org/JSON_checker. (fail1.json, fail25.json, fail27.json). Originally fail18.json was not passed too, but there where an error there. – Dwell 25/7, 2016 at 9:48

@GinoPane That's what →dolmen already noted. This regex was modeled after PHPs implementation - which accepts atoms like true and false or a "plain string" instead of an object/array as outer shell. Moreover it's a bit more JSOL than JSON, as it allows unescaped linebreaks/tabs. – Spire 25/7, 2016 at 12:59

@mario, not exactly by now , cause according to RFC-7159 it would be valid JSON strings. Real problem was only with fail25.json, fail27.json, but I've fixed them. – Dwell 25/7, 2016 at 13:25

The Regex also works for json with duplicated nodes on the same level - which in json is wrong there can not be 2 "Head" Nodes on Top Level for example – Truncated 23/3, 2017 at 8:57

The suggested regex fails when the JSON includes escape sequences, e.g. {"libelle":"Cin\u00e9ma Gaumont Amiens"}. regex101.com/r/kkMbN4/1 – Toothwort 9/7, 2018 at 11:32

@Gajus: It fails because you copied the literal 4 backslashes in \\\\ u [0-9a-f]+ over. For regex-only context, it's just 2 backslashes however. – Spire 9/7, 2018 at 13:45

To use in PHP, add trim() to the pattern or it will be error unknow modifier... preg_match(trim($pcre_regex), 'json string here');. – Halmahera 4/12, 2020 at 14:51

this doesn't seem reliable to me: 3v4l.org/DpiAd – Arciniega 11/3, 2022 at 22:3

["FABRICATION",[], This input will cause catastrophic backtracking error. snippt:regex101.com/r/Jj0bRX/1 There is a problem with the array part – Trovillion 15/3, 2022 at 16:13

@DominikLemberger Duplicated property names are perfectly legal in JSON. From the spec: "The JSON syntax does not impose any restrictions on the strings used as names, does not require that name strings be unique, and does not assign any significance to the ordering of name/value pairs. These are all semantic considerations that may be defined by JSON processors or in specifications defining specific uses of JSON for data interchange." – Theatricalize 25/5, 2023 at 14:17

@Trovillion the problem lies not with the array part, but with the string part. Removing the repetition at the end of the first alternative in the <string> subroutine (or making it possessive) fixes it. – Graduate 19/11, 2023 at 0:30

@Arciniega Good catch! I think the problem may be due to the regex engine either timing out or going out of memory because of the large input. The regex can be optimized by making every repetition possessive and using atomic groups where appropriate: it then passes your test consistently for all PHP versions >=5.3.29 3v4l.org/nbgFA – Please note that when inputs are large enough even the optimized expression fails (try changing the size of the array from 1000 to 100000). – Graduate 19/11, 2023 at 0:53

@Graduate good job, and yeah preg_last_error() report a PREG_JIT_STACKLIMIT_ERROR: 3v4l.org/Kf8TM – Arciniega 20/11, 2023 at 16:3

Yes, it's a common misconception that Regular Expressions can match only regular languages. In fact, the PCRE functions can match much more than regular languages, they can match even some non-context-free languages! Wikipedia's article on RegExps has a special section about it.

JSON can be recognized using PCRE in several ways! @mario showed one great solution using named subpatterns and back-references. Then he noted that there should be a solution using recursive patterns (?R). Here is an example of such regexp written in PHP:

$regexString = '"([^"\\\\]*|\\\\["\\\\bfnrt\/]|\\\\u[0-9a-f]{4})*"';
$regexNumber = '-?(?=[1-9]|0(?!\d))\d+(\.\d+)?([eE][+-]?\d+)?';
$regexBoolean= 'true|false|null'; // these are actually copied from Mario's answer
$regex = '/\A('.$regexString.'|'.$regexNumber.'|'.$regexBoolean.'|';    //string, number, boolean
$regex.= '\[(?:(?1)(?:,(?1))*)?\s*\]|'; //arrays
$regex.= '\{(?:\s*'.$regexString.'\s*:(?1)(?:,\s*'.$regexString.'\s*:(?1))*)?\s*\}';    //objects
$regex.= ')\Z/is';

I'm using (?1) instead of (?R) because the latter references the entire pattern, but we have \A and \Z sequences that should not be used inside subpatterns. (?1) references to the regexp marked by the outermost parentheses (this is why the outermost ( ) does not start with ?:). So, the RegExp becomes 268 characters long :)

/\A("([^"\\]*|\\["\\bfnrt\/]|\\u[0-9a-f]{4})*"|-?(?=[1-9]|0(?!\d))\d+(\.\d+)?([eE][+-]?\d+)?|true|false|null|\[(?:(?1)(?:,(?1))*)?\s*\]|\{(?:\s*"([^"\\]*|\\["\\bfnrt\/]|\\u[0-9a-f]{4})*"\s*:(?1)(?:,\s*"([^"\\]*|\\["\\bfnrt\/]|\\u[0-9a-f]{4})*"\s*:(?1))*)?\s*\})\Z/is

Anyway, this should be treated as a "technology demonstration", not as a practical solution. In PHP I'll validate the JSON string with calling the json_decode() function (just like @Epcylon noted). If I'm going to use that JSON (if it's validated), then this is the best method.

Handmaid answered 6/6, 2011 at 8:25 Comment(6)

@dolmen: you may be right, but you shouldn't edit that yourself into the question. Just adding it as a comment should suffice. – Piercy 10/1, 2013 at 9:2

I think \d does not match unicode numbers in PHP's implementation of PCRE. For example ٩ symbol (0x669 arabic-indic digit nine) will be matched using pattern #\p{Nd}#u but not #\d#u – Handmaid 10/1, 2013 at 10:2

@hrant-khachatrian: it does not because you did not use the /u flag. JSON is encoded in UTF-8. For a proper regexp you should use that flag. – Reminisce 10/1, 2013 at 14:13

Besides that, as this implementation is based on @mario's, it repeats the same flaws: at the top level only arrays and object are allowed. Not string, number, boolean or null. Fixing this requires a major refactoring. – Reminisce 10/1, 2013 at 14:14

@Reminisce I did use the u modifier, please look again at the patterns in my previous comment :) Strings, numbers and booleans ARE correctly matched at the top level. You can paste the long regexp here quanetic.com/Regex and try yourself – Handmaid 12/1, 2013 at 13:46

Because of the recursive nature of JSON (nested {...}-s), regex is not suited to validate it. Sure, some regex flavours can recursively match patterns^* (and can therefor match JSON), but the resulting patterns are horrible to look at, and should never ever be used in production code IMO!

^* Beware though, many regex implementations do not support recursive patterns. Of the popular programming languages, these support recursive patterns: Perl, .NET, PHP and Ruby 1.9.2

Soak answered 6/4, 2010 at 8:21 Comment(3)

Humorously relevant related question... – Blanca 31/5, 2011 at 22:59

@all down voters: "regex is not suited to validate it" does not mean certain regex engines can't do it (at least, that is what I meant). Sure, some regex implementations can, but anyone in their right mind would simply use a JSON parser. Just like if someone asks how to build a complete house with only a hammer, I'd answer that a hammer isn't suited for the job, you'd need a complete toolkit and machinery. Sure, someone with enough endurance can do it with just the hammer. – Soak 6/6, 2011 at 20:56

This may be a valid warning, but it does not answer the question. Regex may not be the correct tool, but some people don't have a choice. We're locked into a vendor product that evaluates the output of a service to check its health, and the only option the vendor provides for custom health checking is a web form that accepts a regex. The vendor product that evaluates the service status is not under my team's control. For us, evaluating JSON with regex is now a requirement, therefore, an answer of "unsuitable" is not viable. (I still didn't downvote you.) – Accelerando 28/1, 2019 at 21:34

Looking at the documentation for JSON, it seems that the regex can simply be three parts if the goal is just to check for fitness:

[First] The string starts and ends with either [] or {}
- [{\[]{1}...[}\]]{1}
AND EITHER
- [Second] The character is an allowed JSON control character (just one)
  - ...[,:{}\[\]0-9.\-+Eaeflnr-u \n\r\t]...
- [Third] The set of characters contained in a ""
  - ...".*?"...

All together: [{\[]{1}([,:{}\[\]0-9.\-+Eaeflnr-u \n\r\t]|".*?")+[}\]]{1}

If the JSON string contains newline characters, then you should use the singleline switch on your regex flavor so that . matches newline. Please note that this will not fail on all bad JSON, but it will fail if the basic JSON structure is invalid, which is a straight-forward way to do a basic sanity validation before passing it to a parser.

Enyedy answered 18/7, 2017 at 13:2 Comment(5)

The suggested regex has awful backtracking behavior on certain testcases. If you try running it on '{"a":false, "b":true,"c":100,"' this incomplete json, it halts. Example: regex101.com/r/Zzc6sz. A simple fix would be: [{[]{1}([,:{}[]0-9.\-+Eaeflnr-u \n\r\t]|".*?")+[}]]{1} – Strung 2/8, 2017 at 7:56

@Strung I've updated to reflect your comment. Thanks! – Enyedy 3/8, 2017 at 19:21

This slightly modified version of @Enyedy works perfect for my use case of finding all JSON like structures in text (globally applied to a HTML file in my case): [{\[]{1}([,:{}\[\]0-9.\-+A-zr-u \n\r\t]|".*:?")+[}\]]{1} – Waggish 13/12, 2020 at 17:30

In my environment, and at regexr, this is matching against {{"parentRelationField": "Project_Name__c", "employeeIdField": "Employee_Name__c"} - did you find a way to prevent it matching when the open and close braces are not matching in count? – Coniine 25/3, 2022 at 21:3

@ShaneK, for something like that, you're better off with one of the other more complex solutions or using a simple function to count {}. – Enyedy 1/4, 2022 at 15:10

I tried @mario's answer, but it didn't work for me, because I've downloaded test suite from JSON.org (archive) and there were 4 failed tests (fail1.json, fail18.json, fail25.json, fail27.json).

I've investigated the errors and found out, that fail1.json is actually correct (according to manual's note and RFC-7159 valid string is also a valid JSON). File fail18.json was not the case either, cause it contains actually correct deeply-nested JSON:

[[[[[[[[[[[[[[[[[[[["Too deep"]]]]]]]]]]]]]]]]]]]]

So two files left: fail25.json and fail27.json:

["  tab character   in  string  "]

and

["line
break"]

Both contains invalid characters. So I've updated the pattern like this (string subpattern updated):

$pcreRegex = '/
          (?(DEFINE)
             (?<number>   -? (?= [1-9]|0(?!\d) ) \d+ (\.\d+)? ([eE] [+-]? \d+)? )
             (?<boolean>   true | false | null )
             (?<string>    " ([^"\n\r\t\\\\]* | \\\\ ["\\\\bfnrt\/] | \\\\ u [0-9a-f]{4} )* " )
             (?<array>     \[  (?:  (?&json)  (?: , (?&json)  )*  )?  \s* \] )
             (?<pair>      \s* (?&string) \s* : (?&json)  )
             (?<object>    \{  (?:  (?&pair)  (?: , (?&pair)  )*  )?  \s* \} )
             (?<json>   \s* (?: (?&number) | (?&boolean) | (?&string) | (?&array) | (?&object) ) \s* )
          )
          \A (?&json) \Z
          /six';

So now all legal tests from json.org can be passed.

Dwell answered 25/7, 2016 at 11:34 Comment(2)

This will match just JSON values(strings, booleans, and numbers) as well, which is not a JSON object/array. – Phosphate 7/2, 2020 at 14:21

I created a Ruby implementation of Mario's solution, which does work:

# encoding: utf-8

module Constants
  JSON_VALIDATOR_RE = /(
         # define subtypes and build up the json syntax, BNF-grammar-style
         # The {0} is a hack to simply define them as named groups here but not match on them yet
         # I added some atomic grouping to prevent catastrophic backtracking on invalid inputs
         (?<number>  -?(?=[1-9]|0(?!\d))\d+(\.\d+)?([eE][+-]?\d+)?){0}
         (?<boolean> true | false | null ){0}
         (?<string>  " (?>[^"\\\\]* | \\\\ ["\\\\bfnrt\/] | \\\\ u [0-9a-f]{4} )* " ){0}
         (?<array>   \[ (?> \g<json> (?: , \g<json> )* )? \s* \] ){0}
         (?<pair>    \s* \g<string> \s* : \g<json> ){0}
         (?<object>  \{ (?> \g<pair> (?: , \g<pair> )* )? \s* \} ){0}
         (?<json>    \s* (?> \g<number> | \g<boolean> | \g<string> | \g<array> | \g<object> ) \s* ){0}
       )
    \A \g<json> \Z
    /uix
end

########## inline test running
if __FILE__==$PROGRAM_NAME

  # support
  class String
    def unindent
      gsub(/^#{scan(/^(?!\n)\s*/).min_by{|l|l.length}}/u, "")
    end
  end

  require 'test/unit' unless defined? Test::Unit
  class JsonValidationTest < Test::Unit::TestCase
    include Constants

    def setup

    end

    def test_json_validator_simple_string
      assert_not_nil %s[ {"somedata": 5 }].match(JSON_VALIDATOR_RE)
    end

    def test_json_validator_deep_string
      long_json = <<-JSON.unindent
      {
          "glossary": {
              "title": "example glossary",
          "GlossDiv": {
                  "id": 1918723,
                  "boolean": true,
                  "title": "S",
            "GlossList": {
                      "GlossEntry": {
                          "ID": "SGML",
                "SortAs": "SGML",
                "GlossTerm": "Standard Generalized Markup Language",
                "Acronym": "SGML",
                "Abbrev": "ISO 8879:1986",
                "GlossDef": {
                              "para": "A meta-markup language, used to create markup languages such as DocBook.",
                  "GlossSeeAlso": ["GML", "XML"]
                          },
                "GlossSee": "markup"
                      }
                  }
              }
          }
      }
      JSON

      assert_not_nil long_json.match(JSON_VALIDATOR_RE)
    end

  end
end

Ulcerative answered 23/5, 2012 at 20:20 Comment(4)

Using \d is dangerous. In many regexp implementations \d matches the Unicode definition of a digit that is not just [0-9] but instead includes alternates scripts. So unless Unicode support in Ruby is still broken, you have to fix the regexp in your code. – Reminisce 10/1, 2013 at 9:4

As far as I know, Ruby uses PCRE in which \d does not match ALL unicode definitions of "digit." Or are you saying that it should? – Ulcerative 6/2, 2015 at 20:26

Except that it does not. False positive: "\x00", [True]. False negative: "\u0000", "\n". Hangs on: "[{"":[{"":[{"":" (repeated 1000x). – Petrie 29/8, 2016 at 21:2

Not too hard to add as test cases and then tweak the code to pass. How to get it not to blow the stack with a depth of 1000+ is an entirely different matter, though... – Ulcerative 8/6, 2017 at 21:36

For "strings and numbers", I think that the partial regular expression for numbers:

-?(?:0|[1-9]\d*)(?:\.\d+)(?:[eE][+-]\d+)?

should be instead:

-?(?:0|[1-9]\d*)(?:\.\d+)?(?:[eE][+\-]?\d+)?

since the decimal part of the number is optional, and also it is probably safer to escape the - symbol in [+-] since it has a special meaning between brackets

Factotum answered 3/11, 2010 at 14:46 Comment(2)

It looks a bit strange, that -0 is a valid number but RFC 4627 allows it and your regular expression conforms to it. – Cannelloni 3/5, 2013 at 11:28

A trailing comma in a JSON array caused my Perl 5.16 to hang, possibly because it kept backtracking. I had to add a backtrack-terminating directive:

(?<json>   \s* (?: (?&number) | (?&boolean) | (?&string) | (?&array) | (?&object) )(*PRUNE) \s* )
                                                                                   ^^^^^^^^

This way, once it identifies a construct that is not 'optional' (* or ?), it shouldn't try backtracking over it to try to identify it as something else.

Sadye answered 14/9, 2012 at 2:17 Comment(0)

Regex that validate simple JSON not JSONArray

it validate key(string):value(string,integer,[{key:value},{key:value}],{key:value})

^\{(\s|\n\s)*(("\w*"):(\s)*("\w*"|\d*|(\{(\s|\n\s)*(("\w*"):(\s)*("\w*(,\w+)*"|\d{1,}|\[(\s|\n\s)*(\{(\s|\n\s)*(("\w*"):(\s)*(("\w*"|\d{1,}))((,(\s|\n\s)*"\w*"):(\s)*("\w*"|\d{1,}))*(\s|\n)*\})){1}(\s|\n\s)*(,(\s|\n\s)*\{(\s|\n\s)*(("\w*"):(\s)*(("\w*"|\d{1,}))((,(\s|\n\s)*"\w*"):(\s)*("\w*"|\d{1,}))*(\s|\n)*\})?)*(\s|\n\s)*\]))((,(\s|\n\s)*"\w*"):(\s)*("\w*(,\w+)*"|\d{1,}|\[(\s|\n\s)*(\{(\s|\n\s)*(("\w*"):(\s)*(("\w*"|\d{1,}))((,(\s|\n\s)*"\w*"):(\s)*("\w*"|\d{1,}))*(\s|\n)*\})){1}(\s|\n\s)*(,(\s|\n\s)*\{(\s|\n\s)*(("\w*"):(\s)*(("\w*"|\d{1,}))((,(\s|\n\s)*"\w*"):("\w*"|\d{1,}))*(\s|\n)*\})?)*(\s|\n\s)*\]))*(\s|\n\s)*\}){1}))((,(\s|\n\s)*"\w*"):(\s)*("\w*"|\d*|(\{(\s|\n\s)*(("\w*"):(\s)*("\w*(,\w+)*"|\d{1,}|\[(\s|\n\s)*(\{(\s|\n\s)*(("\w*"):(\s)*(("\w*"|\d{1,}))((,(\s|\n\s)*"\w*"):(\s)*("\w*"|\d{1,}))*(\s|\n)*\})){1}(\s|\n\s)*(,(\s|\n\s)*\{(\s|\n\s)*(("\w*"):(\s)*(("\w*"|\d{1,}))((,(\s|\n\s)*"\w*"):(\s)*("\w*"|\d{1,}))*(\s|\n)*\})?)*(\s|\n\s)*\]))((,(\s|\n\s)*"\w*"):(\s)*("\w*(,\w+)*"|\d{1,}|\[(\s|\n\s)*(\{(\s|\n\s)*(("\w*"):(\s)*(("\w*"|\d{1,}))((,(\s|\n\s)*"\w*"):(\s)*("\w*"|\d{1,}))*(\s|\n)*\})){1}(\s|\n\s)*(,(\s|\n\s)*\{(\s|\n\s)*(("\w*"):(\s)*(("\w*"|\d{1,}))((,(\s|\n\s)*"\w*"):("\w*"|\d{1,}))*(\s|\n)*\})?)*(\s|\n\s)*\]))*(\s|\n\s)*\}){1}))*(\s|\n)*\}$

sample data that validate by this JSON

{
"key":"string",
"key": 56,
"key":{
        "attr":"integer",
        "attr": 12
        },
"key":{
        "key":[
            {
                "attr": 4,
                "attr": "string"
            }
        ]
     }
}

Mak answered 1/5, 2020 at 5:59 Comment(0)

As was written above, if the language you use has a JSON-library coming with it, use it to try decoding the string and catch the exception/error if it fails! If the language does not (just had such a case with FreeMarker) the following regex could at least provide some very basic validation (it's written for PHP/PCRE to be testable/usable for more users). It's not as foolproof as the accepted solution, but also not that scary =):

~^\{\s*\".*\}$|^\[\n?\{\s*\".*\}\n?\]$~s

short explanation:

// we have two possibilities in case the string is JSON
// 1. the string passed is "just" a JSON object, e.g. {"item": [], "anotheritem": "content"}
// this can be matched by the following regex which makes sure there is at least a {" at the
// beginning of the string and a } at the end of the string, whatever is inbetween is not checked!

^\{\s*\".*\}$

// OR (character "|" in the regex pattern)
// 2. the string passed is a JSON array, e.g. [{"item": "value"}, {"item": "value"}]
// which would be matched by the second part of the pattern above

^\[\n?\{\s*\".*\}\n?\]$

// the s modifier is used to make "." also match newline characters (can happen in prettyfied JSON)

if I missed something that would break this unintentionally, I'm grateful for comments!

Electromotive answered 7/1, 2015 at 10:12 Comment(0)

-2

Here my regexp for validate string:

^\"([^\"\\]*|\\(["\\\/bfnrt]{1}|u[a-f0-9]{4}))*\"$

Was written usign original syntax diagramm.

Impious answered 23/7, 2013 at 7:7 Comment(1)

It is an invalid regex – Radian 20/10, 2022 at 6:42

-4

I realize that this is from over 6 years ago. However, I think there is a solution that nobody here has mentioned that is way easier than regexing

function isAJSON(string) {
    try {
        JSON.parse(string)  
    } catch(e) {
        if(e instanceof SyntaxError) return false;
    };  
    return true;
}

Pauwles answered 27/6, 2016 at 22:18 Comment(1)

The question was not about JavaScript. – Epigone 28/9, 2023 at 13:12

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Yes, a complete regex validation is possible.

Simpler RFC4627 verification

Regex that validate simple JSON not JSONArray

sample data that validate by this JSON

Recommended topics

Hot tags