Suggestion: Regex caching

I’m not sure whether regex caching has been implemented or not, but it would certainly help when running a regex search between each packet in a stream of hundreds of thousands or millions of them.

Most regex engines can compile the regex to some internal representation and reuse it, not having to re-parse the regex each time it’s used.

This would be straightforward (I assume) when used with a string literal as parameter. That’s my usecase. I have a long string literal regex (RegExSearch(buf, "a very very long string literal", ...) that’s used on the order of a million times.

Presumably the script engine keeps all string literals at fixed locations in memory, and the regex function could special case on being fed a literal and use this information to maintain a cache of regexes without having to inspect the contents of the regex string.

Of course, a regex cache typically uses a hashmap based on regex strings, but even if that is already implemented, it would help not have to compute the hash of potentially a 10kb long regex… Yes, in my case they are this big. It’s not a big deal for a few thousand searches, but multiply the overhead by a million and it all adds up.

1 Like

If you are using the FIndAll/FindFirst/FindNext functions to do regular expression finding, those functions do regular expression caching. However the RegExSearch function does not do caching and rebuilds the regular expression each call. We would have to add in some sort of functions in the future to allow caching for RegExSearch. If there was some way for you to switch to FIndAll/FindFirst/FindNext that would probably be faster for a large number of searches.

Graeme
SweetScape Software