url: fix surrogate handling in encodeAuth()#11161
url: fix surrogate handling in encodeAuth()#11161TimothyGu wants to merge 2 commits intonodejs:masterfrom
Conversation
lib/querystring.js
Outdated
There was a problem hiding this comment.
How about moving other private functions to internal/querystring too?
There was a problem hiding this comment.
I only moved this array because it is used in multiple places (querystring and url). The only other entity in querystring that is somewhat shared is ParsedQueryString, a copy of which is in url (as ParsedQueryString as well) and in internal/url (as StorageObject). I've moved this as well. (There is also an analog of it in events as EventHandlers, but I kept it as it is since it'd be weird for that module to import from internal/querystring).
There was a problem hiding this comment.
I think placing private bits intointernal would make the boundaries clearer, whether its shared or not, like BufferList for _stream_readable.js (only used once there).
lib/url.js
Outdated
There was a problem hiding this comment.
I know this originally came from querystring.escape(), but that function's implementation has recently improved performance-wise and we should probably incorporate those changes here (specifically the ASCII value checking -- it uses a lookup table now).
There was a problem hiding this comment.
Done. 3% improvement on url/legacy-vs-whatwg-url-serialize.js with type=auth method=legacy n=5e5.
lib/querystring.js
Outdated
There was a problem hiding this comment.
I think we should be more consistent about how we import single items like this. Here we're using the pre-ES6 style, but in other places we are destructuring like const { hexTable } = require('internal/querystring');
There was a problem hiding this comment.
I initially followed the style of existing imports in the file (see Buffer above), but since I'm importing multiple items from this module now I started using destructuring.
|
@joyeecheung and @mscdex, comments addressed. New CI: https://ci.nodejs.org/job/node-test-pull-request/6213/ @mscdex, I also factored out the entire percent encoding function so that it can be shared between querystring's Benchmark results |
lib/internal/querystring.js
Outdated
There was a problem hiding this comment.
Off-topic: what does this TODO mean? Take a buffer and escape its content?
There was a problem hiding this comment.
I assume it means using a user-provided buffer. However, at the same time, this comment goes all the way back to 57d8172 (v0.3.2), so whether this TODO is still applicable isn't clear by any means.
Anyone is welcome to initiate a formal deprecation process of these exports in another PR.
lib/internal/querystring.js
Outdated
There was a problem hiding this comment.
Nit: s/noEscape/noEscapeTable would be more intuitive, noEscape sounds like a boolean..
lib/querystring.js
Outdated
There was a problem hiding this comment.
Off-topic: unescapeBuffer, encode and decode are all removed from the docs, mind adding a comment?
(Not sure about which stage of deprecation they are in)
There was a problem hiding this comment.
Hmm..mind adding a comment about that they have never been documented, so technically not public APIs, just exposed by accident?
There was a problem hiding this comment.
I'd almost rather just document them IMHO, at least unescapeBuffer() since it's not an alias to an already documented method. I've personally had uses for unescapeBuffer() in my own projects. *shrug*
There was a problem hiding this comment.
Not related to deprecations or anything, just now that we are refactoring things, might as well :P
|
I just benchmarked the current changes in this PR and there is a 3-4% performance regression (with high "confidence" ratings) in the querystring-stringify benchmarks. With that in mind, I think we should avoid extracting the logic into a separate function ( |
|
Ouch, yeah, with numbers like that I have to agree. |
|
@mscdex, okay for now I left percent encoding alone in their respective files. |
lib/internal/querystring.js
Outdated
There was a problem hiding this comment.
I think this can be moved back to lib/querystring.js as well, since it's not used anywhere else, along with unhexTable.
There was a problem hiding this comment.
I think one reason to keep it in internal is that unescapeBuffer is exported to user land by accident, putting it in internal should make it clear(if there is no performance regression caused by this though).
There was a problem hiding this comment.
I still think we should just document unescapeBuffer(), since it can be useful (and I've used it in some of my projects before for decoding binary/latin1 strings).
There was a problem hiding this comment.
Hmm, so probably move it back to lib/querystring.js for now, and let another PR sort this issue out, no matter we want to make it really internal or start to document it?
There was a problem hiding this comment.
Just to be clear, this PR does not modify the exports on querystring: unescapeBuffer() is still exported on require('querystring'), just the physical location is changed.
That said, I thought this function would be of interest for other modules like url as well, which was why I moved it to internal/url.js in the first place. As it doesn't seem like this decision is popular, I've reverted it. PTAL.
lib/internal/querystring.js
Outdated
There was a problem hiding this comment.
Can this be switched to the module.exports = { ... } syntax instead (for faster property access)?
|
@mscdex, any more comments? Note, I am not against documenting |
|
@TimothyGu I recognize that, I was just advocating for returning it back to lib/querystring.js. Current changes LGTM. |
Also factor out common parts in querystring and url. PR-URL: nodejs#11161 Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Brian White <mscdex@mscdex.net>
|
Landed in c6b12d0. |
Also factor out common parts in querystring and url. Backport-of: nodejs#11161
Also factor out common parts in querystring and url. Backport-of: nodejs#11161
Also factor out common parts in querystring and url. Backport-of: #11161
Currently, the legacy URL stringifier miscalculates the offset of an
extra surrogate, causing the high surrogate to be included unescaped:
> url.format({ auth: '\uD83D\uDE00', hostname: 'a' })
'//%F0%9F%98%80�@A'
PR-URL: nodejs#11387
Backport-of: nodejs#11161
Currently, the legacy URL stringifier miscalculates the offset of an extra surrogate, causing the high surrogate to be included unescaped:
Also included in the PR is a commit that moves the function, which is only used in the legacy
urlmodule, to that file. Certain generic percent encoding handling functions and tables are also moved to a separate lib/internal/querystring.js.CI: https://ci.nodejs.org/job/node-test-pull-request/6212/Checklist
make -j4 test(UNIX), orvcbuild test(Windows) passesAffected core subsystem(s)
url, querystring