mardi 4 août 2015

Trying to find regex performance tweaks for large pattern against binary matches

This is a snippet of a very large regex I am working on - for lack of a better description it is basically a tree of branches from the starting characters which I found to increase performance (radically: million passes 3.6s cut to 0.875s).

For some reason using ?: to skip back-references did not accelerate anything but I discovered ?> sliced 200ms off a million passes without errors.

But I am at a dead-end for performance tweaks - I still see patterns in there but I do not understand enough about advanced regex to re-order it further.

Are there any obvious tweaks that I am missing?

      /\x87(?>                   
              \xa6\xF0\x9F\x87[\xa8-\xac\xae\xb1\xb2\xb4\xb7\xb9\xba\xbc\xbf]
            | \xa7\xF0\x9F\x87[\xa6\xa7\xa9-\xaf\xb2-\xb4\xb7-\xb9\xbc\xbe\xbf]
            | \xa8\xF0\x9F\x87[\xa6\xa9\xab-\xae\xb1-\xb4\xb7\xba\xbb\xbe\xbf]
            | \xa9\xF0\x9F\x87[\xaa\xaf\xb0\xb2\xb4\xbf]
            | \xaa\xF0\x9F\x87[\xa8\xaa\xac\xad\xb7-\xb9]
            | \xab\xF0\x9F\x87[\xae-\xb0\xb2\xb4\xb7]
            | \xac\xF0\x9F\x87[\xa6\xa7\xa9\xaa\xad\xae\xb1-\xb3\xb6\xb7\xb9\xba\xbc\xbe]
            | \xad\xF0\x9F\x87[\xb0\xb3\xb7\xb9\xba]
            | \xae\xF0\x9F\x87[\xa9\xaa\xb1\xb3\xb6-\xb9]
            | \xaf\xF0\x9F\x87[\xaa\xb2\xb4\xb5]
            | \xb0\xF0\x9F\x87[\xaa\xac-\xae\xb2\xb3\xb5\xb7\xbc\xbe\xbf]
            | \xb1\xF0\x9F\x87[\xa6-\xa8\xae\xb0\xb7\xb8\xb9\xba\xbb\xbe]
            | \xb2\xF0\x9F\x87[\xa6\xa8-\xaa\xac\xad\xb0-\xb4\xb7\xb8\xb9\xba-\xbf]
            | \xb3\xF0\x9F\x87[\xa6\xa8\xaa\xac\xae\xb1\xb4\xb5\xb7\xba\xbf]
            | \xb4\xF0\x9F\x87[\xb2]
            | \xb5\xF0\x9F\x87[\xa6\xaa-\xad\xb0\xb1\xb7-\xb9\xbc\xbe]
            | \xb6\xF0\x9F\x87[\xa6]
            | \xb7\xF0\x9F\x87[\xb4\xb8\xba\xbc]
            | \xb8\xF0\x9F\x87[\xa6-\xaa\xac-\xae\xb0-\xb4\xb7\xb9\xbb\xbe\xbf]
            | \xb9\xF0\x9F\x87[\xa9\xac\xad\xaf\xb1-\xb4\xb7\xb9\xbb\xbc\xbf]
            | \xba\xF0\x9F\x87[\xa6\xac\xb8\xbe\xbf]
            | \xbb\xF0\x9F\x87[\xa6\xa8\xaa\xae\xb3\xba]
            | \xbc\xF0\x9F\x87[\xab\xb8]
            | \xbd\xF0\x9F\x87[\xb0]
            | \xbe\xF0\x9F\x87[\xaa]
            | \xbf\xF0\x9F\x87[\xa6\xb2\xbc]
            | [\xa6-\xbf]   
      )/xS

Aucun commentaire:

Enregistrer un commentaire