Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 32 additions & 6 deletions urlpattern/resources/urlpatterntestdata.json
Original file line number Diff line number Diff line change
Expand Up @@ -1145,6 +1145,14 @@
{
"pattern": [{ "protocol": "http", "port": "80 " }],
"inputs": [{ "protocol": "http", "port": "80" }],
"exactly_empty_components": ["port"],
"expected_match": {
"protocol": { "input": "http", "groups": {} }
}
},
{
"pattern": [{ "protocol": "http", "port": "100000" }],
"inputs": [{ "protocol": "http", "port": "100000" }],

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite follow what the intention of this test case is. Could you help me to understand?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely. It tests the maximum valid port number. If it is greater than 65k, it should throw (according to url spec)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. This is for the step 2.1.2 in the port state in the basic-url-parser. That makes sense.

"expected_obj": "error"
},
{
Expand Down Expand Up @@ -2367,15 +2375,24 @@
},
{
"pattern": [{ "hostname": "bad#hostname" }],

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you clarify why only #%/ don't throw errors and other hostname patterns like "bad\:hostname" or "bad>hostname" continue throw errors?

btw I noticed "bad\:hostname" updates hostname in URL.

u = new URL('http:/dummy.site')
u.hostname = "bad\\:hostname"
u.hostname  // bad

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Host parser (specifically domain to ASCII with domain and false) strip all trailing values whenever it sees # https://url.spec.whatwg.org/#concept-host-parser

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The host parser does not do that. Do you mean the host setter or some such?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The host parser does not do that. Do you mean the host setter or some such?

Yes, I meant the host setter, which calls the host parser.

Could you clarify why only #%/ don't throw errors and other hostname patterns like "bad:hostname" or "bad>hostname" continue throw errors?

@sisidovski Sorry I missed this comment. URL spec hostname state step 3 (ref: https://url.spec.whatwg.org/#hostname-state) says that...

if c is the EOF code point, U+002F (/), U+003F (?), or U+0023 (#) ... then decrease pointer by 1, and then: ... Let host be the result of host parsing buffer with url is not special.

So, if you see these characters, we take the existing non-empty buffer and parse it as a valid hostname.

"expected_obj": "error"
"exactly_empty_components": ["port"],
"expected_match": {
"hostname": { "input": "bad", "groups": {} }
}
},
{
"pattern": [{ "hostname": "bad%hostname" }],
"expected_obj": "error"
"exactly_empty_components": ["port"],
"expected_match": {
"hostname": { "input": "bad%hostname", "groups": {} }
}
},
{
"pattern": [{ "hostname": "bad/hostname" }],
"expected_obj": "error"
"exactly_empty_components": ["port"],
"expected_match": {
"hostname": { "input": "bad", "groups": {} }
}
},
{
"pattern": [{ "hostname": "bad\\:hostname" }],
Expand Down Expand Up @@ -2419,15 +2436,24 @@
},
{
"pattern": [{ "hostname": "bad\nhostname" }],

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are \n, \r, and \t just stripped?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is defined in URL spec. All ascii tab or newline are removed from the input before any parsing is done.

Remove all ASCII tab or newline from input.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, bad\nhostname is passed as input to basic url parser, and those are removed in the step 3.

"expected_obj": "error"
"exactly_empty_components": ["port"],
"expected_match": {
"hostname": { "input": "badhostname", "groups": {} }
}
},
{
"pattern": [{ "hostname": "bad\rhostname" }],
"expected_obj": "error"
"exactly_empty_components": ["port"],
"expected_match": {
"hostname": { "input": "badhostname", "groups": {} }
}
},
{
"pattern": [{ "hostname": "bad\thostname" }],
"expected_obj": "error"
"exactly_empty_components": ["port"],
"expected_match": {
"hostname": { "input": "badhostname", "groups": {} }
}
},
{
"pattern": [{}],
Expand Down