JS: Flag more results in js/redos by erik-krogh · Pull Request #4823 · github/codeql

erik-krogh · 2020-12-14T13:07:45Z

Gets a TP/TN for CVE-2020-26289 and CVE-2017-16111

There is a few things happening in this PR

Removing unsound optimization:

The optimization removed around half the state-pairs.
But in some cases, that would cause other parts of the state-pair-graph to become disconnected.

What essentially happened was that E and C was removed from the below graph, which caused FNs.

Support nested stars.

Consider the below two regular expressions:
(they link to their NFA, as generated by the delta predicate).
var tst2 = /(d*)*e/
var tst3 = /(f*)+g/;

I added support for this pattern by detecting the epsilon-transition-loop to the inner repetition.

I have not added support for `/(a?b?)*c/`.

My previous PR at one point included support for detecting /(a?b?)*c/, but that support was way to ad-hoc.
I think I understand why this regexp causes worst-case O(2^n) runtime:

The inner part a?b? can match 3 possible string: a, b, or ab.
When the input string is repetitions of ababab, then all combinations of the above 3 will be attempted.
Just repeating a or b won't work, as that will remove 2 of the possible 3 matches.

Here is the NFA for /(a?b?)*c/ (as generated by the delta predicate).

Here is a simplified version of the above NFA.

I've only found a few TPs that are caused by this pattern, examples:

/^-?\d+((\.\d+)?[A-Za-z%$]?)+$/
/javascript((s)?( )?)*:/
/^([^.]?\.?)*__v$/,

I think the way to go is to directly detect the syntax (sequence of optionals inside a repetition), and then add that as a new case inside isReDoSCandidate.
I'll look at that later, but I would appreciate early feedback.

erik-krogh · 2020-12-16T08:53:08Z

An evaluation came back looking just fine.

No performance degradation, lots of new results, and no lost results.
(The dist-compare thinks some results were lost, but the same regexps are still reported, just slightly differently).

asgerf · 2020-12-16T10:18:35Z

javascript/ql/src/Performance/ReDoS.ql

+    or
+    r1 = r2 and	
+    q1 = q2 and	
+    epsilonSucc+(q) = q and	


There should be a comment here explaining why this is a fork. It's not entirely trivial and it's part of what we discussed in the last PR, so let's continue that discussion here.

I've added my understanding of the problem below, feel free to use that or discuss here if you don't think it's accurate.

If q can reach itself by epsilon transitions, then there are two distinct paths to the q1/q2 state: one that uses the loop and one that doesn't. The engine will separately attempt to match with each path, despite ending in the same state. The "fork" thus arises from the choice of whether to use the loop or not.

To avoid every state in the loop becoming a fork state, we arbitrarily pick the InfiniteRepetitionQuantifier state as the canonical fork state for the loop (every epsilon-loop must contain such a state).

The missing piece here is of course an explanation of why we check for a "mid" state being an infinite repetition quantifier as well. It seems that check is the reason we don't flag (a?b?)*c so I'm immediately a little suspicious of it.

I believe the only correct fork state in (a?b?)*c is immediately after the a since there are two ways to read b afterwards: with or without the loop. The infinite repetition quantifier state is not actually a valid fork due to the restriction that a repetition body can't match the empty string, which means we can't use the epsilon-loop from that starting point.

It's possible that picking all the correct fork states while avoiding redundant fork states is too hard, and in either case, it's fine to be out of scope for this PR. But I'd still like to see some justification for the "mid" criterion, if only to explain that it's a heuristic.

I like your explanation - I'm going to put it in.

The missing piece here is of course an explanation of why we check for a "mid" state being an infinite repetition quantifier as well. It seems that check is the reason we don't flag (a?b?)*c so I'm immediately a little suspicious of it.

That is correct. Without the mid check we would flag (a?b?)*c.
But we would also flag (a?)*, which is a false positive (that only blows up quadraticly).
So I want to keep it.

I believe the only correct fork state in (a?b?)*c is immediately after the a since there are two ways to read b afterwards: with or without the loop. The infinite repetition quantifier state is not actually a valid fork due to the restriction that a repetition body can't match the empty string, which means we can't use the epsilon-loop from that starting point.

I'm not sure what the right fork state is, but your proposal definitely sounds plausible.
The problem is that there are lambda transitions from everything to everything, and it is therefore difficult to recognize the pattern based on the NFA.

As I said in the previous PR. I think the best way to go is a more syntactic special case for the (a?b?)*c pattern.
This would also allow make it easier to get the correct pump string.
(Adding that special case is a thing for a later PR).

A problem is that the fork for (a?b?)*c requires multiple transitions, and our isFork predicate only a single transition.

The fork (as I see it) is between:

a -> [skip b and go back to the start of the loop] -> [skip a] -> b

a -> b

Both of these match the string ab, but they took 2 different routes in the NFA.

I'm not sure it's feasible to expand isFork to support the multiple transition steps required for that fork.

asgerf · 2020-12-16T11:09:40Z

javascript/ql/src/Performance/ReDoS.ql

    or
-    step(_, _, _, q1, q2) and
-    q1.toString() <= q2.toString()
+    step(_, _, _, q1, q2)


Nice catch, that can't have been fun to debug. 😬

Merging symmetric states in a product automaton is a legit optimization, though, so removing that doesn't seem right.

I think the real bug was that we forgot to check for a transition to (q2,q1) which is equal to (q1,q2) after merging symmetric states. Could you try this instead?

(step(_, _, _, q1, q2) or step(_, _, _, q2, q1)) and q1.toString() <= q2.toString()

Nice catch, that can't have been fun to debug. grimacing

Oh yeah! Debugging predicate delta2(StatePair q, StatePair r) was no fun.
I made graphs of state-pairs, those were not fun!

Could you try this instead?

That worked! And it slightly improves performance.

I'll try to it generalizes it to 3-tuples (for super-linear ReDoS).
Edit: The order matters in my 3-tuples, so it didn't generalize.

erik-krogh · 2021-01-06T09:58:18Z

I've rebased on main to resolve conflicts.

esbena

Diff and comments LGTM, and I believe you have addressed Asger's comments sufficiently.
All of my comments are optional.

javascript/ql/src/Performance/ReDoS.ql

erik-krogh added JS Awaiting evaluation Do not merge yet, this PR is waiting for an evaluation to finish labels Dec 14, 2020

erik-krogh requested a review from a team as a code owner December 14, 2020 13:07

erik-krogh removed the Awaiting evaluation Do not merge yet, this PR is waiting for an evaluation to finish label Dec 16, 2020

asgerf reviewed Dec 16, 2020

View reviewed changes

erik-krogh added 6 commits January 6, 2021 10:34

add more tests for js/ReDoS

b42aac1

undo unsound optimization in js/ReDoS

77967c3

support nested stars in js/ReDoS

3d98732

autoformat

4392f02

reintroduce performance improvement - but sound this time

c58f67b

add comment in isFork about /(a*)*/ regular expressions

28cffa1

erik-krogh force-pushed the furtherReDoS branch from d62f957 to 28cffa1 Compare January 6, 2021 09:57

esbena previously approved these changes Jan 7, 2021

View reviewed changes

javascript/ql/src/Performance/ReDoS.ql Show resolved Hide resolved

javascript/ql/src/Performance/ReDoS.ql Show resolved Hide resolved

javascript/ql/src/Performance/ReDoS.ql Outdated Show resolved Hide resolved

erik-krogh dismissed esbena’s stale review via 08a0b78 January 7, 2021 10:59

add comment about regexp detected by js/polynomial-redos

7e21081

erik-krogh force-pushed the furtherReDoS branch from 08a0b78 to 7e21081 Compare January 7, 2021 11:06

esbena approved these changes Jan 7, 2021

View reviewed changes

codeql-ci merged commit c193d9f into github:main Jan 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JS: Flag more results in js/redos#4823

JS: Flag more results in js/redos#4823
codeql-ci merged 7 commits intogithub:mainfrom
erik-krogh:furtherReDoS

erik-krogh commented Dec 14, 2020 •

edited

Loading

Uh oh!

erik-krogh commented Dec 16, 2020

Uh oh!

asgerf Dec 16, 2020

Uh oh!

erik-krogh Dec 16, 2020

Uh oh!

erik-krogh Jan 6, 2021

Uh oh!

asgerf Dec 16, 2020

Uh oh!

erik-krogh Dec 16, 2020 •

edited

Loading

Uh oh!

erik-krogh commented Jan 6, 2021

Uh oh!

esbena left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

erik-krogh commented Dec 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Removing unsound optimization:

Support nested stars.

I have not added support for /(a?b?)*c/.

Uh oh!

erik-krogh commented Dec 16, 2020

Uh oh!

asgerf Dec 16, 2020

Choose a reason for hiding this comment

Uh oh!

erik-krogh Dec 16, 2020

Choose a reason for hiding this comment

Uh oh!

erik-krogh Jan 6, 2021

Choose a reason for hiding this comment

Uh oh!

asgerf Dec 16, 2020

Choose a reason for hiding this comment

Uh oh!

erik-krogh Dec 16, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

erik-krogh commented Jan 6, 2021

Uh oh!

esbena left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

erik-krogh commented Dec 14, 2020 •

edited

Loading

I have not added support for `/(a?b?)*c/`.

erik-krogh Dec 16, 2020 •

edited

Loading