refactor(timers): Refactor timers to use one async op per timer by andreubotella · Pull Request #12862 · denoland/deno

andreubotella · 2021-11-22T22:10:45Z

This change also makes the timers implementation closer to the spec, and sets up the stage to implement AbortSignal.timeout() (whatwg/dom#1032).

Fixes #8965.
Fixes #10974.
Fixes #11398.

ext/timers/lib.rs

runtime/js/40_testing.js

andreubotella · 2021-11-27T22:11:47Z

#12908 and #12913 must be merged before this PR, since they fix the CI failures.

This change also makes the timers implementation closer to the spec, and sets up the stage to implement `AbortSignal.timeout()` (whatwg/dom#1032). Fixes denoland#8965. Fixes denoland#10974. Fixes denoland#11398.

andreubotella · 2021-11-30T11:59:58Z

It seems like the httpConnAutoCloseDelayedOnUpgrade and httpServerDeleteRequestHasBody tests are flaky with this implementation of timers – they occasionally fail with pending ops. Since it used to be that timers were clamped to a minimum of 4ms everywhere, and now they're clamped to 0ms when the timer nesting level is lower than 5, it might be that we need to set the op sanitizer delay to 4ms.

bartlomieju · 2021-11-30T12:14:28Z

@lucacasonato please advise

lucacasonato · 2021-11-30T13:21:14Z

it might be that we need to set the op sanitizer delay to 4ms.

I don't think so. That sounds like a hack. The tests are probably not correct if they fail if the timer is actually 0 sec (1 evt loop turn).

…resolved.

andreubotella · 2021-12-02T18:02:40Z

I've added back the code in runAfterTimeout that waits to run the callback until all previous timers have resolved. Rather than an array from which entries are spliced out, I made it a linked list, but this might make the code less readable.

This should unblock #12963 and perhaps #12953. cc @kt3k

bartlomieju

@andreubotella thank you for the PR. It fixes a lot of long outstanding bugs which is great. At the same time it's touching a crucial piece of infrastructure so special care must be taken when reviewing and landing it.

The implementation mostly looks good to me, I have a few question for bits that are not immediately obvious to me.

Before landing I'd like to get more eyes on it.

@bnoordhuis @lucacasonato please review

cli/tests/unit/timers_test.ts

ext/timers/lib.rs

ext/timers/01_timers.js

bnoordhuis

Thanks for the PR, Andreu. I can't really intuit how this affects performance. Do we have timer benchmarks?

cli/tests/unit/timers_test.ts

bnoordhuis · 2021-12-04T11:05:47Z

ext/timers/01_timers.js

+    // 4. If timeout is less than 0, then set timeout to 0.
+    // 5. If nesting level is greater than 5, and timeout is less than 4, then
+    // set timeout to 4.
+    timeout = MathMax(timeout, (timerNestingLevel > 5) ? 4 : 0);


Two comments:

5 is a magic constant. Why not 4, 6 or 42? The comment doesn't explain it.

I find it harder to read than the equivalent if statement (which is shorter to boot):

if (timerNestingLevel > 5 && timeout < 4) timeout = 4;

Note step 4 and the ternary operator. The equivalent would be:

if (timeout < 0) timeout = 0; if (timerNestingLevel > 5 && timeout < 4) timeout = 4;

5 is a magic constant. Why not 4, 6 or 42? The comment doesn't explain it.

It's spec-mandated. I suspect due to compat concerns.

if (timeout < 0) timeout = 0; if (timerNestingLevel > 5 && timeout < 4) timeout = 4;

This one is a bit easier to read, so maybe let's go with it?

It's spec-mandated. I suspect due to compat concerns.

Right, can you add a comment or reference to the spec here?

Right, can you add a comment or reference to the spec here?

This comment is the spec text:

https://github.com/andreubotella/deno/blob/timers/ext/timers/01_timers.js#L131-L136

ext/timers/01_timers.js

bnoordhuis · 2021-12-04T11:29:51Z

ext/timers/01_timers.js

+  /**
+   * A doubly linked list of timers.
+   * @type { { head: ScheduledTimer | null, tail: ScheduledTimer | null } } */
+  const scheduledTimers = { head: null, tail: null };


You can get rid of all the null checks if you turn this into a list head:

// ideally give the list head and timer objects the same object shape // to keep property reads and writes monomorphic const scheduledTimers = { head: null, tail: null }; scheduledTimers.head = scheduledTimers; scheduledTimers.tail = scheduledTimers; function isEmpty() { return scheduledTimers.tail === scheduledTimers; } function append(timer) { timer.prev = scheduledTimers.tail; timer.next = scheduledTimers; scheduledTimers.tail.next = timer; scheduledTimers.tail = timer; } function remove(timer) { timer.prev.next = timer.next; timer.next.prev = timer.prev; // timer.prev = timer.next = null; }

@andreubotella do you want to address this bit before landing?

Would be nice to have now and I'd definitely want to see it addressed eventually but I'm okay with landing this as-is.

That seems harder to reason through, especially if we go all the way and make scheduledTimers monomorphic, since scheduledTimers.prev would be the tail and scheduledTimers.next would be the head. And if we keep the property names head and tail, we'd still need checks to append and remove.

If it's good enough for nginx, libuv, linux and the BSDs, then it's certainly good enough for deno. :-)

(Also node - but I'm reasonably certain I was the one who introduced that pattern there.)

bnoordhuis · 2021-12-04T11:36:31Z

ext/timers/01_timers.js

-  }
+    // 1.
+    PromisePrototypeThen(
+      core.opAsync("op_sleep", millis, cancelRid),


This means there's an op call per timer? To keep overhead down JS timers would ideally all hang off a single tokio::time::sleep().

The trade-off here is either:

a) Having at most one pending timer op at a time, but having to call into Rust to start another op whenever a timer expires.
b) Only calling into Rust when starting or canceling timers, but having multiple pending timer ops.

I haven't measured the difference between these, and due to the very nature of timers, measuring performance isn't easy. Also, if we compare with the current timers implementation, there's the additional confounding factor of #10974. So I'm not sure how to benchmark this.

But note that these async ops do very little work, and I expect that the work of checking the time would be done by tokio's global timer, not when polling the Sleep futures.

And in any case, unrefing timers would be much easier with b) – see #12953.

I agree that we should go with this approach, especially in light of ref/unref for timers.

Node has regular JS timers hanging off a single C++ timer (N->1, common case) and unref'd JS timers backed by individual C++ timers (N->N, uncommon case.)

For Node, that's a worthwhile optimization because programs often have thousands of active timers, usually one or more per TCP connection.

For Deno, I'm not sure. TBH, I don't know how (or if) we deal with read/write timeouts or slowloris style attacks.

At any rate, it's an area worth investigating but it doesn't have to hold up this PR.

kt3k · 2021-12-05T05:49:52Z

cli/tests/unit/timers_test.ts

+    setTimeout(() => array.push(6));
+  }, 0);
+
+  await delay(100);


This looks still a little bit flaky from my observation. ref: https://github.com/denoland/deno/runs/4418361252?check_suite_focus=true

How about creating a promise for each setTimeout call and waiting for all of them instead of delay(N)

Seems like a genuine bug. A race would be missing [4,5,6] but it's only missing 6. Merits deeper investigation.

According to the spec, a timer mustn't go off before other timers that were both scheduled before it and have a lesser or equal timeout. So the array must not be out of order, and the delay must expire after 3 is pushed, but the delay doesn't necessarily have to go off after any of the 4/5/6 timers do. And that's what I find when I test this – either [1,2,3], [1,2,3,4], [1,2,3,4,5] or [1,2,3,4,5,6], but not [1,2] or anything out of order.

Note that the spec doesn't say that a timeout of 0 must queue the action in the timer task queue immediately – step 5.3 in "run steps after a timeout" allows for an arbitrary additional wait. And it seems like an op_sleep with a timeout of 0 doesn't necessarily resolve at the next event loop turn. Should we queue the task immediately in that case?

In any case, I'll be taking @kt3k's suggestion.

Should we queue the task immediately in that case?

Come to think about it, no. This would let timers starve other parts of the event loop. Whereas if timers are always backed by ops, the earliest that a task can be pushed to the timers queue is on the next event loop turn.

I don't follow your line of reasoning.

We have timers A, B and C that expire at time T+0, who arm three more timers D, E, and F that expire at time T+1 (or T+4, doesn't matter), and a delay G that expires at time T+100.

Under normal conditions, they expire in order A to G because 0 < 1 < 100.

Under abnormal conditions, say an event loop stall or an OS scheduler fluke that pauses the program until T+200, then either A-G or A-C, G, D-F are reasonable behavior.

But A-E, G, F? No, never.

A timer's expiry time is set to now() + timeout rather than event_loop_tick_start_time() + timeout. Put another way: timers have no shared time base.

Under normal conditions that doesn't matter because computers are fast: now() doesn't change meaningfully in the time it takes to execute an event loop "tick".

But if there is a stall of duration N between B and C, then there is an equivalent gap between E and F: E expires at time T+1 but F at time T+N.

I don't know how tokio's timers are implemented –I have tried to follow the code and can't really make sense of it–, or how they interact with FuturesUnordered –which is probably more relevant for how they show up here. The spec requires that timers don't resolve before now() + timeout –which I'm assuming tokio does correctly–, and it also requires that a timer doesn't resolve before any timers that were scheduled before it and that have a lower or equal timeout. The code around scheduledTimers guarantees that that condition holds, but it doesn't guarantee anything else. And, purely in terms of the spec requirements, A-E, G, F is valid, because D-F are not required to be added to the timer task queue the instant setTimeout() is called.

For posterity: Bartek and I talked this through OOB.

IMO, it's a quality-of-implementation issue - expiry order should follow the Principle of Least Surprise, regardless of how relaxed the spec is - but it's pre-existing, not a regression, and therefore doesn't need to hold up this PR.

bnoordhuis

LGTM if the test failure that @kt3k pointed out can be resolved.

bnoordhuis · 2021-12-05T09:47:56Z

ext/timers/01_timers.js

-  }
+    // 1.
+    PromisePrototypeThen(
+      core.opAsync("op_sleep", millis, cancelRid),


Node has regular JS timers hanging off a single C++ timer (N->1, common case) and unref'd JS timers backed by individual C++ timers (N->N, uncommon case.)

For Node, that's a worthwhile optimization because programs often have thousands of active timers, usually one or more per TCP connection.

For Deno, I'm not sure. TBH, I don't know how (or if) we deal with read/write timeouts or slowloris style attacks.

At any rate, it's an area worth investigating but it doesn't have to hold up this PR.

bnoordhuis · 2021-12-05T09:49:56Z

ext/timers/01_timers.js

+  /**
+   * A doubly linked list of timers.
+   * @type { { head: ScheduledTimer | null, tail: ScheduledTimer | null } } */
+  const scheduledTimers = { head: null, tail: null };


Would be nice to have now and I'd definitely want to see it addressed eventually but I'm okay with landing this as-is.

bnoordhuis · 2021-12-07T12:40:03Z

Thanks for the PR, Andreu!

andreubotella marked this pull request as draft November 22, 2021 22:11

bartlomieju reviewed Nov 23, 2021

View reviewed changes

ext/timers/lib.rs Show resolved Hide resolved

andreubotella commented Nov 26, 2021

View reviewed changes

runtime/js/40_testing.js Show resolved Hide resolved

This was referenced Nov 27, 2021

If one test case fails with leaking async ops related to timeout, other tests that should fail with leaking async ops do not #8965

Closed

fix(test): Improve reliability of deno test's op sanitizer with timers #12908

Merged

refactor(timers): Refactor timers to use one async op per timer

0b69692

This change also makes the timers implementation closer to the spec, and sets up the stage to implement `AbortSignal.timeout()` (whatwg/dom#1032). Fixes denoland#8965. Fixes denoland#10974. Fixes denoland#11398.

andreubotella force-pushed the timers branch from 12a3dfd to 0b69692 Compare November 28, 2021 15:58

andreubotella changed the title ~~[WIP] Rewrite the timers implementation~~ refactor(timers): Refactor timers to use one async op per timer Nov 28, 2021

andreubotella marked this pull request as ready for review November 28, 2021 15:59

Andreu Botella added 5 commits November 28, 2021 17:20

Add a wildcard to dispatched/completed ops in op_sanitizer_unstable

d5a93c3

Add tests for denoland#11398

06752c9

Add tests for denoland#8965

ab2beb2

Merge branch 'main' into timers

3106dcb

Remove the timer ordering code, and test for it

20259f0

andreubotella mentioned this pull request Nov 30, 2021

feat(ext/timers): add refTimer, unrefTimer API #12942

Closed

andreubotella mentioned this pull request Nov 30, 2021

httpConnAutoCloseDelayedOnUpgrade and httpServerDeleteRequestHasBody are flaky #12944

Closed

rerun CI

eb6a824

This was referenced Dec 1, 2021

feat(ext/timers): add refTimer, unrefTimer API (alt) #12953

Merged

test: fix flaky ws test #12963

Closed

Make runAfterTimeout not run a callback until previous timers have …

d729ca3

…resolved.

Andreu Botella added 4 commits December 2, 2021 23:21

fix

fc9f06e

rerun CI

6c896d4

rerun CI again

7cfefe8

Merge branch 'main' into timers

5bfe68b

bartlomieju reviewed Dec 4, 2021

View reviewed changes

cli/tests/unit/timers_test.ts Outdated Show resolved Hide resolved

ext/timers/lib.rs Show resolved Hide resolved

ext/timers/01_timers.js Show resolved Hide resolved

ext/timers/01_timers.js Show resolved Hide resolved

ext/timers/01_timers.js Outdated Show resolved Hide resolved

bartlomieju requested review from bnoordhuis and lucacasonato December 4, 2021 01:12

Review feedback and doc/comment update

40c618c

andreubotella mentioned this pull request Dec 4, 2021

Spurious assertion error when the callback to setInterval lasts longer than the interval #11398

Closed

bnoordhuis reviewed Dec 4, 2021

View reviewed changes

Andreu Botella added 2 commits December 4, 2021 18:00

Make sure timerOrdering is not flaky

2cdc9db

review feedback

8ee2865

kt3k reviewed Dec 5, 2021

View reviewed changes

bnoordhuis approved these changes Dec 5, 2021

View reviewed changes

Don't use delay in timerOrdering

507fd7c

bnoordhuis merged commit 33da15a into denoland:main Dec 7, 2021

andreubotella deleted the timers branch December 9, 2021 08:05

Conversation

andreubotella commented Nov 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

andreubotella commented Nov 27, 2021

Uh oh!

andreubotella commented Nov 30, 2021

Uh oh!

bartlomieju commented Nov 30, 2021

Uh oh!

lucacasonato commented Nov 30, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andreubotella commented Dec 2, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bartlomieju left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bnoordhuis left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andreubotella Dec 4, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andreubotella Dec 4, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andreubotella Dec 5, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andreubotella Dec 5, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

andreubotella commented Nov 22, 2021 •

edited

Loading

lucacasonato commented Nov 30, 2021 •

edited

Loading

andreubotella commented Dec 2, 2021 •

edited

Loading

andreubotella Dec 4, 2021 •

edited

Loading

andreubotella Dec 4, 2021 •

edited

Loading

andreubotella Dec 5, 2021 •

edited

Loading

andreubotella Dec 5, 2021 •

edited

Loading