refactor(timers): Refactor timers to use one async op per timer#12862
refactor(timers): Refactor timers to use one async op per timer#12862bnoordhuis merged 16 commits intodenoland:mainfrom
Conversation
This change also makes the timers implementation closer to the spec, and sets up the stage to implement `AbortSignal.timeout()` (whatwg/dom#1032). Fixes denoland#8965. Fixes denoland#10974. Fixes denoland#11398.
12a3dfd to
0b69692
Compare
|
It seems like the |
|
@lucacasonato please advise |
I don't think so. That sounds like a hack. The tests are probably not correct if they fail if the timer is actually 0 sec (1 evt loop turn). |
bartlomieju
left a comment
There was a problem hiding this comment.
@andreubotella thank you for the PR. It fixes a lot of long outstanding bugs which is great. At the same time it's touching a crucial piece of infrastructure so special care must be taken when reviewing and landing it.
The implementation mostly looks good to me, I have a few question for bits that are not immediately obvious to me.
Before landing I'd like to get more eyes on it.
@bnoordhuis @lucacasonato please review
bnoordhuis
left a comment
There was a problem hiding this comment.
Thanks for the PR, Andreu. I can't really intuit how this affects performance. Do we have timer benchmarks?
ext/timers/01_timers.js
Outdated
| // 4. If timeout is less than 0, then set timeout to 0. | ||
| // 5. If nesting level is greater than 5, and timeout is less than 4, then | ||
| // set timeout to 4. | ||
| timeout = MathMax(timeout, (timerNestingLevel > 5) ? 4 : 0); |
There was a problem hiding this comment.
Two comments:
-
5 is a magic constant. Why not 4, 6 or 42? The comment doesn't explain it.
-
I find it harder to read than the equivalent
ifstatement (which is shorter to boot):
if (timerNestingLevel > 5 && timeout < 4) timeout = 4;There was a problem hiding this comment.
Note step 4 and the ternary operator. The equivalent would be:
if (timeout < 0) timeout = 0;
if (timerNestingLevel > 5 && timeout < 4) timeout = 4;There was a problem hiding this comment.
5 is a magic constant. Why not 4, 6 or 42? The comment doesn't explain it.
It's spec-mandated. I suspect due to compat concerns.
There was a problem hiding this comment.
if (timeout < 0) timeout = 0;
if (timerNestingLevel > 5 && timeout < 4) timeout = 4;This one is a bit easier to read, so maybe let's go with it?
It's spec-mandated. I suspect due to compat concerns.
Right, can you add a comment or reference to the spec here?
There was a problem hiding this comment.
Right, can you add a comment or reference to the spec here?
This comment is the spec text:
https://github.com/andreubotella/deno/blob/timers/ext/timers/01_timers.js#L131-L136
| /** | ||
| * A doubly linked list of timers. | ||
| * @type { { head: ScheduledTimer | null, tail: ScheduledTimer | null } } */ | ||
| const scheduledTimers = { head: null, tail: null }; |
There was a problem hiding this comment.
You can get rid of all the null checks if you turn this into a list head:
// ideally give the list head and timer objects the same object shape
// to keep property reads and writes monomorphic
const scheduledTimers = { head: null, tail: null };
scheduledTimers.head = scheduledTimers;
scheduledTimers.tail = scheduledTimers;
function isEmpty() {
return scheduledTimers.tail === scheduledTimers;
}
function append(timer) {
timer.prev = scheduledTimers.tail;
timer.next = scheduledTimers;
scheduledTimers.tail.next = timer;
scheduledTimers.tail = timer;
}
function remove(timer) {
timer.prev.next = timer.next;
timer.next.prev = timer.prev;
// timer.prev = timer.next = null;
}There was a problem hiding this comment.
@andreubotella do you want to address this bit before landing?
There was a problem hiding this comment.
Would be nice to have now and I'd definitely want to see it addressed eventually but I'm okay with landing this as-is.
There was a problem hiding this comment.
That seems harder to reason through, especially if we go all the way and make scheduledTimers monomorphic, since scheduledTimers.prev would be the tail and scheduledTimers.next would be the head. And if we keep the property names head and tail, we'd still need checks to append and remove.
There was a problem hiding this comment.
If it's good enough for nginx, libuv, linux and the BSDs, then it's certainly good enough for deno. :-)
(Also node - but I'm reasonably certain I was the one who introduced that pattern there.)
| } | ||
| // 1. | ||
| PromisePrototypeThen( | ||
| core.opAsync("op_sleep", millis, cancelRid), |
There was a problem hiding this comment.
This means there's an op call per timer? To keep overhead down JS timers would ideally all hang off a single tokio::time::sleep().
There was a problem hiding this comment.
The trade-off here is either:
a) Having at most one pending timer op at a time, but having to call into Rust to start another op whenever a timer expires.
b) Only calling into Rust when starting or canceling timers, but having multiple pending timer ops.
I haven't measured the difference between these, and due to the very nature of timers, measuring performance isn't easy. Also, if we compare with the current timers implementation, there's the additional confounding factor of #10974. So I'm not sure how to benchmark this.
But note that these async ops do very little work, and I expect that the work of checking the time would be done by tokio's global timer, not when polling the Sleep futures.
And in any case, unrefing timers would be much easier with b) – see #12953.
There was a problem hiding this comment.
I agree that we should go with this approach, especially in light of ref/unref for timers.
There was a problem hiding this comment.
Node has regular JS timers hanging off a single C++ timer (N->1, common case) and unref'd JS timers backed by individual C++ timers (N->N, uncommon case.)
For Node, that's a worthwhile optimization because programs often have thousands of active timers, usually one or more per TCP connection.
For Deno, I'm not sure. TBH, I don't know how (or if) we deal with read/write timeouts or slowloris style attacks.
At any rate, it's an area worth investigating but it doesn't have to hold up this PR.
cli/tests/unit/timers_test.ts
Outdated
| setTimeout(() => array.push(6)); | ||
| }, 0); | ||
|
|
||
| await delay(100); |
There was a problem hiding this comment.
This looks still a little bit flaky from my observation. ref: https://github.com/denoland/deno/runs/4418361252?check_suite_focus=true
How about creating a promise for each setTimeout call and waiting for all of them instead of delay(N)
There was a problem hiding this comment.
Seems like a genuine bug. A race would be missing [4,5,6] but it's only missing 6. Merits deeper investigation.
There was a problem hiding this comment.
According to the spec, a timer mustn't go off before other timers that were both scheduled before it and have a lesser or equal timeout. So the array must not be out of order, and the delay must expire after 3 is pushed, but the delay doesn't necessarily have to go off after any of the 4/5/6 timers do. And that's what I find when I test this – either [1,2,3], [1,2,3,4], [1,2,3,4,5] or [1,2,3,4,5,6], but not [1,2] or anything out of order.
Note that the spec doesn't say that a timeout of 0 must queue the action in the timer task queue immediately – step 5.3 in "run steps after a timeout" allows for an arbitrary additional wait. And it seems like an op_sleep with a timeout of 0 doesn't necessarily resolve at the next event loop turn. Should we queue the task immediately in that case?
In any case, I'll be taking @kt3k's suggestion.
There was a problem hiding this comment.
Should we queue the task immediately in that case?
Come to think about it, no. This would let timers starve other parts of the event loop. Whereas if timers are always backed by ops, the earliest that a task can be pushed to the timers queue is on the next event loop turn.
There was a problem hiding this comment.
I don't follow your line of reasoning.
We have timers A, B and C that expire at time T+0, who arm three more timers D, E, and F that expire at time T+1 (or T+4, doesn't matter), and a delay G that expires at time T+100.
Under normal conditions, they expire in order A to G because 0 < 1 < 100.
Under abnormal conditions, say an event loop stall or an OS scheduler fluke that pauses the program until T+200, then either A-G or A-C, G, D-F are reasonable behavior.
But A-E, G, F? No, never.
A timer's expiry time is set to now() + timeout rather than event_loop_tick_start_time() + timeout. Put another way: timers have no shared time base.
Under normal conditions that doesn't matter because computers are fast: now() doesn't change meaningfully in the time it takes to execute an event loop "tick".
But if there is a stall of duration N between B and C, then there is an equivalent gap between E and F: E expires at time T+1 but F at time T+N.
There was a problem hiding this comment.
I don't know how tokio's timers are implemented –I have tried to follow the code and can't really make sense of it–, or how they interact with FuturesUnordered –which is probably more relevant for how they show up here. The spec requires that timers don't resolve before now() + timeout –which I'm assuming tokio does correctly–, and it also requires that a timer doesn't resolve before any timers that were scheduled before it and that have a lower or equal timeout. The code around scheduledTimers guarantees that that condition holds, but it doesn't guarantee anything else. And, purely in terms of the spec requirements, A-E, G, F is valid, because D-F are not required to be added to the timer task queue the instant setTimeout() is called.
There was a problem hiding this comment.
For posterity: Bartek and I talked this through OOB.
IMO, it's a quality-of-implementation issue - expiry order should follow the Principle of Least Surprise, regardless of how relaxed the spec is - but it's pre-existing, not a regression, and therefore doesn't need to hold up this PR.
bnoordhuis
left a comment
There was a problem hiding this comment.
LGTM if the test failure that @kt3k pointed out can be resolved.
| } | ||
| // 1. | ||
| PromisePrototypeThen( | ||
| core.opAsync("op_sleep", millis, cancelRid), |
There was a problem hiding this comment.
Node has regular JS timers hanging off a single C++ timer (N->1, common case) and unref'd JS timers backed by individual C++ timers (N->N, uncommon case.)
For Node, that's a worthwhile optimization because programs often have thousands of active timers, usually one or more per TCP connection.
For Deno, I'm not sure. TBH, I don't know how (or if) we deal with read/write timeouts or slowloris style attacks.
At any rate, it's an area worth investigating but it doesn't have to hold up this PR.
| /** | ||
| * A doubly linked list of timers. | ||
| * @type { { head: ScheduledTimer | null, tail: ScheduledTimer | null } } */ | ||
| const scheduledTimers = { head: null, tail: null }; |
There was a problem hiding this comment.
Would be nice to have now and I'd definitely want to see it addressed eventually but I'm okay with landing this as-is.
|
Thanks for the PR, Andreu! |
This change also makes the timers implementation closer to the spec, and sets up the stage to implement
AbortSignal.timeout()(whatwg/dom#1032).Fixes #8965.
Fixes #10974.
Fixes #11398.