How to save/serialize compiled regular expression (std::regex) to a file?
Asked Answered
T

3

3

I'm using <regex> from Visal Studio 2010. I understand that when I create regex object then it's compiled. There is no compile method like in other languages and libraries but I thinks that's how it work, am I right?

I need to store large amount of this compiled regexes in a file so I would just get chunk of memory block and get my compiled regex.

I can't figure how to do this. I found that in PCRE it is possible but it's Linux library. There is a Windows [version2 but it's 3 years old and I would like to use more high-level approach (there isn't c++ wrapper in windows version).

So is it possible to use save std:regex or boost::regex (it's the same right?) as a chunk of memory and then simply reuse it later?

Or is there other simple library for Windows that allows to do this?

EDIT: Thanks for great answers. I'll simply check if it would be sufficient to simply store a regex as a string and then if it would still be slow I'll test and compare it with this old PCRE library.

Tinney answered 21/12, 2010 at 13:35 Comment(3)
I would imagine that you can't just dump the bitwise contents of the object to file, as it will probably contain pointers to dynamically-allocated memory, etc., that will make no sense if you reload it!Conchita
I imagine the same ;) That's why I asked this question. It is possible in PCRE then why it isn't in std::regex? Is it possible in any other library for C++ or in that which is not 3 years old?Tinney
Boost has a POSIX API. I suspect this means that it uses the 'virtual machine' method I talked about in my answer.Texture
T
1

I don't think it can be done without modifying the boost library to support it.

I don't know specifically how the boost regex library is implemented, but most regex libraries compile things to a binary blob that's then interpreted later as a series of instructions for a sort of limited virtual machine.

If boost's regex library is implemented in this way, serializing it would be relatively easy. Just get at the binary blob somehow and dump it to disk. The existence of the POSIX regex API for the boost library tells me that this is probably how it's implemented.

OTOH, another way to implement it (and a not so common way) is by generating something like an abstract syntax tree for the regex. This means that the individual pieces of the regex would be represented by their own objects and those objects would be linked together into some larger structure that represented the whole regex.

If boost does it this way then serialization will be very complex.

This is not possible with C++, but what I really wish happened is that boost could compile constant string regular expressions at compile time with template meta-programming. The reason this is not possible is that it isn't possible to iterate over the contents of a string (even a constant string) with a template.

Texture answered 21/12, 2010 at 14:40 Comment(0)
C
2

You can use the regex strings themselves as the 'serialized' regex - just save those to a file, then when you want to reconstitute the regex objects, just pass the saved strings to the regex constructor.

The only drawbacks I can think of:

  • it might take some more time to 'reconstitute' the regex database, but I really don't know how much (I suspect that the time would be dominated by I/O anyway, so I'm not sure if the difference would be significant - I really don't know how much overhead there is in regex compilation by the boost library's implementation)
  • if you want the stored regexes obfuscated, you'll have to do that yourself instead of relying on the compiled-binary state to be unreadable

The advantages to this are:

  • it's 100% supported, so it's not fragile/brittle
  • it's portable across compiler versions and platforms (ie., not fragile/brittle)

Is the time to compile the regex database (excluding I/O) really significant enough to warrant trying to save the compiled state?

Chishima answered 21/12, 2010 at 18:30 Comment(0)
T
1

I don't think it can be done without modifying the boost library to support it.

I don't know specifically how the boost regex library is implemented, but most regex libraries compile things to a binary blob that's then interpreted later as a series of instructions for a sort of limited virtual machine.

If boost's regex library is implemented in this way, serializing it would be relatively easy. Just get at the binary blob somehow and dump it to disk. The existence of the POSIX regex API for the boost library tells me that this is probably how it's implemented.

OTOH, another way to implement it (and a not so common way) is by generating something like an abstract syntax tree for the regex. This means that the individual pieces of the regex would be represented by their own objects and those objects would be linked together into some larger structure that represented the whole regex.

If boost does it this way then serialization will be very complex.

This is not possible with C++, but what I really wish happened is that boost could compile constant string regular expressions at compile time with template meta-programming. The reason this is not possible is that it isn't possible to iterate over the contents of a string (even a constant string) with a template.

Texture answered 21/12, 2010 at 14:40 Comment(0)
C
0

I'm not sure, but did you take a look at boost::serialization, which can serialize a C++ object?

Cyclopean answered 21/12, 2010 at 14:21 Comment(1)
It can't. I actually want to use boost::serialization but don't know how to serialize regex (as a compiled binary not a string with pattern that will be compiled eventually).Tinney

© 2022 - 2024 — McMap. All rights reserved.