From 4db1de546f7882ccc092d9ee749338590ad7c3fc Mon Sep 17 00:00:00 2001 From: dcode Date: Fri, 1 Jul 2022 16:30:53 +0200 Subject: [PATCH 01/15] Clarify string literal section kind --- proposals/stringref/Overview.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/stringref/Overview.md b/proposals/stringref/Overview.md index 29edffa..a0a1481 100644 --- a/proposals/stringref/Overview.md +++ b/proposals/stringref/Overview.md @@ -252,7 +252,7 @@ be used in global variable initializers. #### String literal section The `string.const` section indicates the literal as an `i32` index into -a new custom section: a string table, encoded as a `vec(vec(u8))` of +a new regular section: a string table, encoded as a `vec(vec(u8))` of valid WTF-8 strings. Because literal strings can contain codepoint 0, strings in the string table do not use NUL as a terminator. The string table section must immediately precede the global section, or where the From c4bfe42f57023110647f117d6940b8c7771452a1 Mon Sep 17 00:00:00 2001 From: dcode Date: Fri, 1 Jul 2022 17:19:03 +0200 Subject: [PATCH 02/15] Clarify string literal requirement --- proposals/stringref/Overview.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/stringref/Overview.md b/proposals/stringref/Overview.md index 29edffa..7ed8c7c 100644 --- a/proposals/stringref/Overview.md +++ b/proposals/stringref/Overview.md @@ -25,7 +25,7 @@ find good compromises are "minimal" and "viable". 3. Allow WebAssembly implementations to efficiently represent strings internally in either WTF-8 or WTF-16 encodings 4. Allow access to WTF-16 code units for Java, Dart, Kotlin and similar languages - 5. Allow string literals in element sections + 5. Allow string literals as constant expressions ## Definitions - *codepoint*: An integer in the range [0,0x10FFFF]. From 727dca6880c7847ec77a1ee37beea3ce565b99a6 Mon Sep 17 00:00:00 2001 From: Andy Wingo Date: Wed, 6 Jul 2022 08:43:47 +0200 Subject: [PATCH 03/15] Integrate with GC proposal Add instructions to encode and decode string contents to and from GC-managed arrays of UTF-8/WTF-8 and WTF-16. Fixes #1. --- proposals/stringref/Overview.md | 66 +++++++++++++++++++++++++++++++++ 1 file changed, 66 insertions(+) diff --git a/proposals/stringref/Overview.md b/proposals/stringref/Overview.md index 39b00f5..326bb08 100644 --- a/proposals/stringref/Overview.md +++ b/proposals/stringref/Overview.md @@ -504,6 +504,68 @@ Return the number of codepoints that were actually consumed. Return a substring of *`view`*, starting at the current position of *`view`* and continuing for at most *`codepoints`* codepoints. +### GC integration + +Though this proposal does not have a dependency on the [GC +proposal](https://github.com/WebAssembly/gc/blob/master/proposals/gc/MVP.md), +compiler authors that target GC will likely want to be able to encode +the contents of a stringref to a GC array, and vice versa. + +Our primary use cases are: + + 1. String-builder interfaces, which will likely use a WTF-8 or WTF-16 + array as intermediate storage, depending on the language being + compiled. We will need to be able to create strings from arrays. + When the string contents are ready, we will almost always decode + from array offset 0 and continue to some offset before the end of + the array. We'll also need to be able to append a string's contents + to an array at a given offset. + 2. Communicating strings with another process, possibly over the + network. Here, UTF-8 and WTF-8 are the important encodings, and we + need to be able to read and write to arbitrary slices of arrays. + +``` +(string.new_wtf8_array $wtf8_policy codeunits:$t start:i32 end:i32) + if expand($t) => array i8 + -> str:stringref +``` +Create a new string from a subsequence of the *`codeunits`* WTF-8 bytes +in a GC-managed array, starting from offset *`start`* and continuing to +but not including *`end`*. If *`end`* is less than *`start`* or is +greater than the array length, trap. The bytes are decoded according to +`$wtf8_policy`, as in `string.new_wtf8`. The maximum value for +*`end`*–*`start`* is 231–1; passing a higher value traps. + +``` +(string.new_wtf16_array codeunits:$t start:i32 end:i32) + if expand($t) => array i16 + -> str:stringref +``` +Create a new string from a subsequence of the *`codeunits`* WTF-16 code +units in a GC-managed array, starting from offset *`start`* and +continuing to but not including *`end`*. If *`end`* is less than +*`start`* or is greater than the array length, trap. The maximum value +for *`end`*–*`start`* is 230–1; passing a higher value +traps. + +``` +(string.encode_wtf8_array $wtf8_policy str:stringref array:$t start:i32) + if expand($t) => array (mut i8) +(string.encode_wtf16_array str:stringref array:$t start:i32) + if expand($t) => array (mut i16) +``` +Encode the contents of the string *`str`* as WTF-8 or WTF-16, +respectively, to the GC-managed array *`array`*, starting at offset +*`start`*. The number of code units written will be the same as +returned by the corresponding `string.measure_wtf8` or +`string.measure_wtf16`, respectively. If there is not space for the +code units in the array, trap. Note that no `NUL` terminator is ever +written. + +For `string.encode_wtf8_array`, if an isolated surrogate is seen, the +behavior depends on the *`$wtf8_policy`* immediate, in the same way as +`string.encode_wtf8`. + ## Binary encoding ``` @@ -542,6 +604,10 @@ instr ::= ... | 0xfb 0xa2 ⇒ stringview_iter.advance | 0xfb 0xa3 ⇒ stringview_iter.rewind | 0xfb 0xa4 ⇒ stringview_iter.slice + | 0xfb 0xb0 $policy:u32 [gc] ⇒ string.new_wtf8_array $policy + | 0xfb 0xb1 [gc] ⇒ string.new_wtf16_array + | 0xfb 0xb2 $policy:u32 [gc] ⇒ string.encode_wtf8_array $policy + | 0xfb 0xb3 [gc] ⇒ string.encode_wtf16_array ;; New section. If present, must be present only once, and right before ;; the globals section (or where the globals section would be). Each From 1558965373710007194baa11ab90ff3978f2bf22 Mon Sep 17 00:00:00 2001 From: Andy Wingo Date: Wed, 6 Jul 2022 08:49:38 +0200 Subject: [PATCH 04/15] Change stringview_iter.cur to stringview_iter.next Seems more useful in the general case, without loss of expressivity. Fixes #4. --- proposals/stringref/Overview.md | 16 ++++++---------- 1 file changed, 6 insertions(+), 10 deletions(-) diff --git a/proposals/stringref/Overview.md b/proposals/stringref/Overview.md index 39b00f5..0ec249a 100644 --- a/proposals/stringref/Overview.md +++ b/proposals/stringref/Overview.md @@ -477,11 +477,12 @@ stringview can then be used to iterate over the codepoints of the string. ``` -(stringview_iter.cur view:stringview_iter) +(stringview_iter.next view:stringview_iter) -> codepoint:i32 ``` -Return the codepoint currently pointed to by *`view`*, or -1 if the -iterator is at the end of the string. +If *`view`* is already at the end of the string, return -1. Otherwise +return the codepoint currently pointed to by the iterator, and advance +the iterator's position by one codepoint. ``` (stringview_iter.advance view:stringview_iter codepoints:i32) @@ -538,7 +539,7 @@ instr ::= ... | 0xfb 0x9b $mem:u32 ⇒ stringview_wtf16.encode $mem | 0xfb 0x9c ⇒ stringview_wtf16.slice | 0xfb 0xa0 ⇒ string.as_iter - | 0xfb 0xa1 ⇒ stringview_iter.cur + | 0xfb 0xa1 ⇒ stringview_iter.next | 0xfb 0xa2 ⇒ stringview_iter.advance | 0xfb 0xa3 ⇒ stringview_iter.rewind | 0xfb 0xa4 ⇒ stringview_iter.slice @@ -908,7 +909,7 @@ WTF-8 in memory, for longer strings. block $done loop $loop local.get $iter - stringview_iter.cur + stringview_iter.next local.tee $ch i32.const -1 @@ -917,11 +918,6 @@ WTF-8 in memory, for longer strings. local.get $ch call $have-codepoint - - local.get $iter - i32.const 1 - string.advance_wtf8 - drop end end) ``` From d04c30931cdb44d8dc8fb05bb4e9210cfab8f95f Mon Sep 17 00:00:00 2001 From: Andy Wingo Date: Wed, 6 Jul 2022 08:58:23 +0200 Subject: [PATCH 05/15] Further WTF-16 byte order clarification Fixes #15. --- proposals/stringref/Overview.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/proposals/stringref/Overview.md b/proposals/stringref/Overview.md index 39b00f5..1647a82 100644 --- a/proposals/stringref/Overview.md +++ b/proposals/stringref/Overview.md @@ -228,7 +228,9 @@ Unicode standard, version 14.0.0, page 126. Create a new string from the *`codeunits`* code units encoded in memory at *`ptr`*. Out-of-bounds access will trap. *`ptr`* must be two-byte aligned, and will trap otherwise. The maximum value for *`codeunits`* -is 230–1; passing a higher value traps. +is 230–1; passing a higher value traps. Each code unit is +read from memory as if with `i32.load16`, and are therefore decoded +using little-endian byte order. #### `string.new` size limits From fc901e78ceb6b52990b53a0d6d64cb11b8dc3566 Mon Sep 17 00:00:00 2001 From: Andy Wingo Date: Wed, 6 Jul 2022 09:05:15 +0200 Subject: [PATCH 06/15] String slice operands have exclusive end Fixes #17. --- proposals/stringref/Overview.md | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/proposals/stringref/Overview.md b/proposals/stringref/Overview.md index 39b00f5..8f9cd0c 100644 --- a/proposals/stringref/Overview.md +++ b/proposals/stringref/Overview.md @@ -418,8 +418,9 @@ If an isolated surrogate is seen, the behavior determines on the -> str:stringref ``` Return a substring of *`view`*, for the WTF-8 bytes starting at offset -*`start`* and not greater than *`end`*. *`start`* and *`end`* receive -the "WTF-8 position treatment", as for `stringview_wtf8.advance`. +*`start`* and continuing to but not including *`end`*. *`start`* and +*`end`* receive the "WTF-8 position treatment", as for +`stringview_wtf8.advance`. ### `stringview_wtf16` @@ -462,8 +463,9 @@ transformation is the "WTF-16 position treatment". -> str:stringref ``` Return a substring of *`view`*, for the WTF-16 code units starting at offset -*`start`* and not greater than *`end`*. *`start`* and *`end`* receive -the "WTF-16 position treatment", as for `stringview_wtf16.encode`. +*`start`* and continuing to but not including *`end`*. *`start`* and +*`end`* receive the "WTF-16 position treatment", as for +`stringview_wtf16.encode`. ### `stringview_iter` @@ -638,7 +640,9 @@ rather it just deals in WTF-16, as most source languages that expose local.get $str string.as_wtf16 local.get $offset + local.get $offset local.get $codeunits + i32.add stringview_wtf16.slice) ``` From d62a253762dd6f525cd2a5f3f88172a01d0ec1dd Mon Sep 17 00:00:00 2001 From: Andy Wingo Date: Wed, 6 Jul 2022 09:12:46 +0200 Subject: [PATCH 07/15] Allow null operands to string.eq Fixes #18. --- proposals/stringref/Overview.md | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/proposals/stringref/Overview.md b/proposals/stringref/Overview.md index 39b00f5..5a00935 100644 --- a/proposals/stringref/Overview.md +++ b/proposals/stringref/Overview.md @@ -195,6 +195,10 @@ address ::= i32 | i64 Such instructions also take the memory to which to read or write as an immediate. +Although `stringref` is a nullable type, trap if a null `stringref` +value reaches any instruction in this proposal. The one exception is +`string.eq`. + ### Creating strings ``` @@ -344,8 +348,9 @@ If an allocation fails, the implementation must trap. Fallible ``` (string.eq a:stringref b:stringref) -> i32 ``` -Return 1 if the strings *`a`* and *`b`* contain the same codepoint -sequence. Return 0 otherwise. +If both *`a`* and *`b`* are null, return 1. If only one of them is +null, return 0. Otherwise return 1 if the strings *`a`* and *`b`* +contain the same codepoint sequence, or 0 otherwise. ``` (string.is_usv_sequence str:stringref) From f9257447c357190974e3eb979939a9838a79e139 Mon Sep 17 00:00:00 2001 From: Andy Wingo Date: Wed, 6 Jul 2022 09:17:45 +0200 Subject: [PATCH 08/15] string.encode_wtf8, string.encode_wtf16 return encoded length Interestingly, one of the examples already assumed this behavior. Fixes #24. --- proposals/stringref/Overview.md | 17 +++++++++-------- 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/proposals/stringref/Overview.md b/proposals/stringref/Overview.md index 39b00f5..3fe8c73 100644 --- a/proposals/stringref/Overview.md +++ b/proposals/stringref/Overview.md @@ -283,9 +283,9 @@ number of code units represent these sizes as unsigned values. ``` (string.measure_wtf8 $wtf8_policy str:stringref) - -> bytes:i32 + -> codeunits:i32 (string.measure_wtf16 str:stringref) - -> bytes:i32 + -> codeunits:i32 ``` Measure the number of code units that would be required to encode the contents of the string *`str`* to WTF-8 or WTF-16 respectively. @@ -299,11 +299,13 @@ require more code units than the limit, the result is -1. ``` (string.encode_wtf8 $memory $wtf8_policy str:stringref ptr:address) + -> codeunits:i32 (string.encode_wtf16 $memory str:stringref ptr:address) + -> codeunits:i32 ``` Encode the contents of the string *`str`* as UTF-8, WTF-8, or WTF-16, -respectively, to memory at *`ptr`*. The number of code units written -will be the same as returned by the corresponding +respectively, to memory at *`ptr`*. Return the number of code units +written, which will be the same as returned by the corresponding `string.measure_*encoding*`. Each code unit is written to memory as if stored by `i32.store8` or @@ -554,8 +556,8 @@ stringrefs ::= section_14(0x00 vec(vec(u8))) ## Examples -We assume that the textual syntax for `string.encode` and `string.new` -allows you to elide the memory, in which case it defaults to 0. +We assume that the textual syntax for instructions that take a memory +operand allows you to elide the memory, in which case it defaults to 0. ### Make string from NUL-terminated UTF-8 in memory @@ -797,10 +799,9 @@ open to considering adding more instructions. local.get $str local.get $ptr - string.encode_wtf8 wtf8 + string.encode_wtf8 wtf8 ;; push bytes written, same as $len local.get $ptr - local.get $len i32.add i32.const 0 i32.store8 ;; write NUL From ead9d47b59d17ebefc6ce01f2fc43913a4131527 Mon Sep 17 00:00:00 2001 From: Andy Wingo Date: Wed, 6 Jul 2022 11:12:31 +0200 Subject: [PATCH 09/15] Fix grammar --- proposals/stringref/Overview.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/stringref/Overview.md b/proposals/stringref/Overview.md index 1647a82..3d50f6a 100644 --- a/proposals/stringref/Overview.md +++ b/proposals/stringref/Overview.md @@ -229,7 +229,7 @@ Create a new string from the *`codeunits`* code units encoded in memory at *`ptr`*. Out-of-bounds access will trap. *`ptr`* must be two-byte aligned, and will trap otherwise. The maximum value for *`codeunits`* is 230–1; passing a higher value traps. Each code unit is -read from memory as if with `i32.load16`, and are therefore decoded +read from memory as if with `i32.load16`, and is therefore decoded using little-endian byte order. #### `string.new` size limits From 6d988b49c6aca2a007beb5a6770242b4a87215ee Mon Sep 17 00:00:00 2001 From: Andy Wingo Date: Wed, 6 Jul 2022 11:18:13 +0200 Subject: [PATCH 10/15] Address feedback; encode returns codeunits written See #33. --- proposals/stringref/Overview.md | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/proposals/stringref/Overview.md b/proposals/stringref/Overview.md index 326bb08..57244b8 100644 --- a/proposals/stringref/Overview.md +++ b/proposals/stringref/Overview.md @@ -511,7 +511,7 @@ proposal](https://github.com/WebAssembly/gc/blob/master/proposals/gc/MVP.md), compiler authors that target GC will likely want to be able to encode the contents of a stringref to a GC array, and vice versa. -Our primary use cases are: +The primary use cases are: 1. String-builder interfaces, which will likely use a WTF-8 or WTF-16 array as intermediate storage, depending on the language being @@ -524,6 +524,9 @@ Our primary use cases are: network. Here, UTF-8 and WTF-8 are the important encodings, and we need to be able to read and write to arbitrary slices of arrays. +The instructions below shall be available in WebAssembly implementations +that support both GC and stringrefs. + ``` (string.new_wtf8_array $wtf8_policy codeunits:$t start:i32 end:i32) if expand($t) => array i8 @@ -551,13 +554,15 @@ traps. ``` (string.encode_wtf8_array $wtf8_policy str:stringref array:$t start:i32) if expand($t) => array (mut i8) + -> codeunits:i32 (string.encode_wtf16_array str:stringref array:$t start:i32) if expand($t) => array (mut i16) + -> codeunits:i32 ``` Encode the contents of the string *`str`* as WTF-8 or WTF-16, respectively, to the GC-managed array *`array`*, starting at offset -*`start`*. The number of code units written will be the same as -returned by the corresponding `string.measure_wtf8` or +*`start`*. Return the number of code units written, which will be the +same as the result of a the corresponding `string.measure_wtf8` or `string.measure_wtf16`, respectively. If there is not space for the code units in the array, trap. Note that no `NUL` terminator is ever written. From 8bd7487bce963217c3d69594f8d34a763e746b05 Mon Sep 17 00:00:00 2001 From: Andy Wingo Date: Thu, 7 Jul 2022 08:59:33 +0200 Subject: [PATCH 11/15] stringview_wtf16.encode returns number of written code units Since the operand is a maximum number of code units to write, it makes sense to return the number of code units actually written, for the same reason as string.encode_wtf16. See #24. --- proposals/stringref/Overview.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/proposals/stringref/Overview.md b/proposals/stringref/Overview.md index ee81d85..0aa4401 100644 --- a/proposals/stringref/Overview.md +++ b/proposals/stringref/Overview.md @@ -458,10 +458,12 @@ Return the 16-bit code unit at offset *`pos`* in the WTF-16 encoding of ``` (stringview_wtf16.encode $memory view:stringview_wtf16 ptr:address pos:i32 len:i32) + -> codeunits:i32 ``` Write a subsequence of the WTF-16 encoding of *`view`* to memory at *`ptr`*, starting at the WTF-16 offset *`pos`*, writing no more than *`len`* 16-bit code units. If *`ptr`* is not two-byte aligned, trap. +Return the number of code units written. If *`pos`* is greater than the number of WTF-16 code units in *`view`*, it is as if it were instead given as the code unit length. This From 84e20792d2a975a31b62ca03d143d775153b5c48 Mon Sep 17 00:00:00 2001 From: Andy Wingo Date: Mon, 12 Sep 2022 14:51:17 +0200 Subject: [PATCH 12/15] Binary encoding of opcodes after 0xfb prefix are LEBs Fixes #9. --- proposals/stringref/Overview.md | 59 +++++++++++++++++---------------- 1 file changed, 31 insertions(+), 28 deletions(-) diff --git a/proposals/stringref/Overview.md b/proposals/stringref/Overview.md index 0aa4401..14d7e51 100644 --- a/proposals/stringref/Overview.md +++ b/proposals/stringref/Overview.md @@ -599,34 +599,34 @@ wtf8_policy ::= 0x00 ⇒ utf8 | 0x02 ⇒ replace instr ::= ... - | 0xfb 0x80 $mem:u32 $policy:u32 ⇒ string.new_wtf8 $mem $policy - | 0xfb 0x81 $mem:u32 ⇒ string.new_wtf16 $mem - | 0xfb 0x82 $idx:u32 ⇒ string.const $idx - | 0xfb 0x84 $policy:u32 ⇒ string.measure_wtf8 $policy - | 0xfb 0x85 ⇒ string.measure_wtf16 - | 0xfb 0x86 $mem:u32 $policy:u32 ⇒ string.encode_wtf8 $mem $policy - | 0xfb 0x87 $mem:u32 ⇒ string.encode_wtf16 $mem - | 0xfb 0x88 ⇒ string.concat - | 0xfb 0x89 ⇒ string.eq - | 0xfb 0x8a ⇒ string.is_usv_sequence - | 0xfb 0x90 ⇒ string.as_wtf8 - | 0xfb 0x91 ⇒ stringview_wtf8.advance - | 0xfb 0x92 $mem:u32 $policy:u32 ⇒ stringview_wtf8.encode $mem $policy - | 0xfb 0x93 ⇒ stringview_wtf8.slice - | 0xfb 0x98 ⇒ string.as_wtf16 - | 0xfb 0x99 ⇒ stringview_wtf16.length - | 0xfb 0x9a ⇒ stringview_wtf16.get_codeunit - | 0xfb 0x9b $mem:u32 ⇒ stringview_wtf16.encode $mem - | 0xfb 0x9c ⇒ stringview_wtf16.slice - | 0xfb 0xa0 ⇒ string.as_iter - | 0xfb 0xa1 ⇒ stringview_iter.next - | 0xfb 0xa2 ⇒ stringview_iter.advance - | 0xfb 0xa3 ⇒ stringview_iter.rewind - | 0xfb 0xa4 ⇒ stringview_iter.slice - | 0xfb 0xb0 $policy:u32 [gc] ⇒ string.new_wtf8_array $policy - | 0xfb 0xb1 [gc] ⇒ string.new_wtf16_array - | 0xfb 0xb2 $policy:u32 [gc] ⇒ string.encode_wtf8_array $policy - | 0xfb 0xb3 [gc] ⇒ string.encode_wtf16_array + | 0xfb 0x80:u32 $mem:u32 $policy:u32 ⇒ string.new_wtf8 $mem $policy + | 0xfb 0x81:u32 $mem:u32 ⇒ string.new_wtf16 $mem + | 0xfb 0x82:u32 $idx:u32 ⇒ string.const $idx + | 0xfb 0x84:u32 $policy:u32 ⇒ string.measure_wtf8 $policy + | 0xfb 0x85:u32 ⇒ string.measure_wtf16 + | 0xfb 0x86:u32 $mem:u32 $policy:u32 ⇒ string.encode_wtf8 $mem $policy + | 0xfb 0x87:u32 $mem:u32 ⇒ string.encode_wtf16 $mem + | 0xfb 0x88:u32 ⇒ string.concat + | 0xfb 0x89:u32 ⇒ string.eq + | 0xfb 0x8a:u32 ⇒ string.is_usv_sequence + | 0xfb 0x90:u32 ⇒ string.as_wtf8 + | 0xfb 0x91:u32 ⇒ stringview_wtf8.advance + | 0xfb 0x92:u32 $mem:u32 $policy:u32 ⇒ stringview_wtf8.encode $mem $policy + | 0xfb 0x93:u32 ⇒ stringview_wtf8.slice + | 0xfb 0x98:u32 ⇒ string.as_wtf16 + | 0xfb 0x99:u32 ⇒ stringview_wtf16.length + | 0xfb 0x9a:u32 ⇒ stringview_wtf16.get_codeunit + | 0xfb 0x9b:u32 $mem:u32 ⇒ stringview_wtf16.encode $mem + | 0xfb 0x9c:u32 ⇒ stringview_wtf16.slice + | 0xfb 0xa0:u32 ⇒ string.as_iter + | 0xfb 0xa1:u32 ⇒ stringview_iter.next + | 0xfb 0xa2:u32 ⇒ stringview_iter.advance + | 0xfb 0xa3:u32 ⇒ stringview_iter.rewind + | 0xfb 0xa4:u32 ⇒ stringview_iter.slice + | 0xfb 0xb0:u32 $policy:u32 [gc] ⇒ string.new_wtf8_array $policy + | 0xfb 0xb1:u32 [gc] ⇒ string.new_wtf16_array + | 0xfb 0xb2:u32 $policy:u32 [gc] ⇒ string.encode_wtf8_array $policy + | 0xfb 0xb3:u32 [gc] ⇒ string.encode_wtf16_array ;; New section. If present, must be present only once, and right before ;; the globals section (or where the globals section would be). Each @@ -637,6 +637,9 @@ instr ::= ... stringrefs ::= section_14(0x00 vec(vec(u8))) ``` +Note that the u32 (uleb) encoding for the opcode after the `0xfb` prefix +takes two bytes, for opcode values between 0x80 and 0x3fff. + ## Examples We assume that the textual syntax for instructions that take a memory From a1d061549fd5fa3a03b7c29fc0686fe1709aaf37 Mon Sep 17 00:00:00 2001 From: Andy Wingo Date: Mon, 12 Sep 2022 15:44:56 +0200 Subject: [PATCH 13/15] Fold WTF-8 policy into instructions Instead of having e.g. `string.encode_wtf8` taking a policy immediate, just make a separate instruction for each policy. Fixes #35. --- proposals/stringref/Overview.md | 283 +++++++++++++++++++++----------- 1 file changed, 186 insertions(+), 97 deletions(-) diff --git a/proposals/stringref/Overview.md b/proposals/stringref/Overview.md index 14d7e51..4863b4f 100644 --- a/proposals/stringref/Overview.md +++ b/proposals/stringref/Overview.md @@ -202,28 +202,32 @@ value reaches any instruction in this proposal. The one exception is ### Creating strings ``` -wtf8_policy ::= 'utf8' | 'wtf8' | 'replace' -(string.new_wtf8 $memory $wtf8_policy ptr:address bytes:i32) +(string.new_utf8 $memory ptr:address bytes:i32) + -> str:stringref +(string.new_lossy_utf8 $memory ptr:address bytes:i32) + -> str:stringref +(string.new_wtf8 $memory ptr:address bytes:i32) -> str:stringref ``` -Create a new string from the *`bytes`* WTF-8 bytes in memory at *`ptr`*. +Create a new string from the *`bytes`* bytes in memory at *`ptr`*. Out-of-bounds access will trap. The maximum value for *`bytes`* is 231–1; passing a higher value traps. -The precise decoding semantics depend on the *`$wtf8_policy`* immediate. +These three instructions decode the bytes in three different ways: -For `utf8`, the bytes are decoded using a strict UTF-8 decoder. If the -bytes are not valid UTF-8, trap. + * `string.new_utf8`, decodes using a strict UTF-8 decoder. If the + bytes are not valid UTF-8, trap. -For `wtf8`, the bytes are decoded using a strict WTF-8 decoder, which is -like UTF-8 but also allows isolated surrogates. If the bytes are not -valid WTF-8, trap. + * `string.new_lossy_utf8` decodes using a sloppy UTF-8 decoder: all + maximal subparts of an invalid subsequence are decoded as if they + were `U+FFFD` (the replacement character) instead. This instruction + will never trap due to a decoding error. See the section entitled + "U+FFFD Substitution of Maximal Subparts" in the Unicode standard, + version 14.0.0, page 126. -For `replace`, all maximal subparts of an invalid subsequence are -decoded as if they were `U+FFFD` (the replacement character) instead. -The `replace` policy will never trap due to a decoding error. See the -section entitled "U+FFFD Substitution of Maximal Subparts" in the -Unicode standard, version 14.0.0, page 126. + * `string.new_wtf8` decodes using a strict WTF-8 decoder, which is like + UTF-8 but also allows isolated surrogates. If the bytes are not + valid WTF-8, trap. ``` (string.new_wtf16 $memory ptr:address codeunits:i32) @@ -277,8 +281,8 @@ string literal section as a future extension. The maximum size for the WTF-8 encoding of an individual string literal is 231–1 bytes. Embeddings may impose their own limits which -are more restricted. But similarly to `string.new`, instantiating a -module with string literals may fail due to lack of memory resources, +are more restricted. But similarly to `string.new_wtf8`, instantiating +a module with string literals may fail due to lack of memory resources, even if the string size is formally within the limits. However `string.const` itself never traps when passed a valid literal offset. @@ -288,46 +292,96 @@ All parameters and return values measuring a number of codepoints or a number of code units represent these sizes as unsigned values. ``` -(string.measure_wtf8 $wtf8_policy str:stringref) +(string.measure_utf8 str:stringref) -> codeunits:i32 -(string.measure_wtf16 str:stringref) +``` +Measure the number of code units (bytes) that would be required to +encode the contents of the string *`str`* to UTF-8. If the string +contains an isolated surrogate, return -1. + +The maximum number of code units returned by `string.measure_utf8` is is +231-1. If an encoding would require more code units than the +limit, the result is -1. + +``` +(string.measure_wtf8 str:stringref) -> codeunits:i32 ``` -Measure the number of code units that would be required to encode the -contents of the string *`str`* to WTF-8 or WTF-16 respectively. -For `string.measure_wtf8` with the `utf8` policy, if the string contains -an isolated surrogate, return -1. +Measure the number of code units (bytes) that would be required to +encode the codepoints of the string *`str`* to WTF-8. + +Note that this instruction also serves to measure an encoding length for +UTF-8 when isolated surrogates are replaced with `U+FFFD` ("lossy +UTF-8"); the same number of bytes is required to encode `U+FFFD` as +would be required to encode an isolated surrogate to WTF-8. The maximum number of code units returned by `string.measure_wtf8` is is -231-1. The maximum number of code units returned by -`string.measure_wtf16` is is 230-1. If an encoding would -require more code units than the limit, the result is -1. +231-1. If an encoding would require more code units than the +limit, the result is -1. ``` -(string.encode_wtf8 $memory $wtf8_policy str:stringref ptr:address) +(string.measure_wtf16 str:stringref) -> codeunits:i32 -(string.encode_wtf16 $memory str:stringref ptr:address) +``` +Measure the number of code units that would be required to encode the +contents of the string *`str`* to WTF-16. + +The maximum number of code units returned by `string.measure_wtf16` is +is 230-1. If an encoding would require more code units than +the limit, the result is -1. + +``` +(string.encode_utf8 $memory str:stringref ptr:address) -> codeunits:i32 ``` -Encode the contents of the string *`str`* as UTF-8, WTF-8, or WTF-16, -respectively, to memory at *`ptr`*. Return the number of code units +Encode the contents of the string *`str`* as UTF-8 to memory at *ptr*. +If an isolated surrogate is seen, trap. Return the number of code units written, which will be the same as returned by the corresponding -`string.measure_*encoding*`. +`string.measure_utf8`. + +The maximum number of bytes that can be encoded at once by +`string.encode` is 231-1. If an encoding would require more +bytes, it is as if the codepoints can't be encoded (a trap). + +``` +(string.encode_lossy_utf8 $memory str:stringref ptr:address) + -> codeunits:i32 +``` +Encode the contents of the string *`str`* as UTF-8 to memory at *`ptr`*. +If an isolated surrogate is seen, encode `U+FFFD` (the replacement +character) instead. Return the number of code units written, which will +be the same as returned by the corresponding `string.measure_wtf8`. + +The maximum number of bytes that can be encoded at once by +`string.encode` is 231-1. If an encoding would require more +bytes, it is as if the codepoints can't be encoded (a trap). -Each code unit is written to memory as if stored by `i32.store8` or -`i32.store16`, respectively, so WTF-16 code units are in little-endian -byte order. +``` +(string.encode_wtf8 $memory str:stringref ptr:address) + -> codeunits:i32 +``` +Encode the contents of the string *`str`* as WTF-8 to memory at *`ptr`*. +Return the number of code units written, which will be the same as +returned by the corresponding `string.measure_wtf8`. The maximum number of bytes that can be encoded at once by `string.encode` is 231-1. If an encoding would require more bytes, it is as if the codepoints can't be encoded (a trap). -For `string.encode_wtf8`, if an isolated surrogate is seen, the behavior -determines on the *`$wtf8_policy`* immediate. For `utf8`, trap. For -`wtf8`, the surrogate is encoded as per WTF-8. For `replace`, `U+FFFD` -(the replacement character) is encoded instead. Note that the UTF-8 -encoding of `U+FFFD` is the same length as the WTF-8 encoding of an -isolated surrogate. +``` +(string.encode_wtf16 $memory str:stringref ptr:address) + -> codeunits:i32 +``` +Encode the contents of the string *`str`* as WTF-16 to memory at +*`ptr`*. Return the number of code units written, which will be the +same as returned by the corresponding `string.measure_wtf16`. + +Each code unit is written to memory as if stored by `i32.store16`, so +WTF-16 code units are in little-endian byte order. + +The maximum number of bytes that can be encoded at once by +`string.encode` is 231-1. If an encoding would require more +bytes, it is as if the codepoints can't be encoded (a trap). ### Concatenation @@ -401,7 +455,11 @@ may allow for 64-bit variants of the position-using instructions, which could relax this restriction.) ``` -(stringview_wtf8.encode $memory $wtf8_policy view:stringview_wtf8 ptr:address pos:i32 bytes:i32) +(stringview_wtf8.encode_utf8 $memory view:stringview_wtf8 ptr:address pos:i32 bytes:i32) + -> next_pos:i32, bytes:i32 +(stringview_wtf8.encode_lossy_utf8 $memory view:stringview_wtf8 ptr:address pos:i32 bytes:i32) + -> next_pos:i32, bytes:i32 +(stringview_wtf8.encode_wtf8 $memory view:stringview_wtf8 ptr:address pos:i32 bytes:i32) -> next_pos:i32, bytes:i32 ``` Write a subsequence of the WTF-8 encoding of *`view`* to memory at @@ -419,8 +477,11 @@ proposal](https://github.com/WebAssembly/memory64/blob/main/proposals/memory64/O may allow for 64-bit variants of the position-using instructions, which could relax this restriction.) -If an isolated surrogate is seen, the behavior determines on the -*`$wtf8_policy`* immediate, as in `string.encode_wtf8`. +If an isolated surrogate is seen, the behavior depends on the +instruction: + * `stringview_wtf8.encode_utf8` will trap. + * `stringview_wtf8.encode_lossy_utf8` will encode `U+FFFD`. + * `stringview_wtf8.encode_wtf8` will encode the isolated surrogate. ``` (stringview_wtf8.slice view:stringview_wtf8 start:i32 end:i32) @@ -542,16 +603,23 @@ The instructions below shall be available in WebAssembly implementations that support both GC and stringrefs. ``` -(string.new_wtf8_array $wtf8_policy codeunits:$t start:i32 end:i32) +(string.new_utf8_array codeunits:$t start:i32 end:i32) + if expand($t) => array i8 + -> str:stringref +(string.new_lossy_utf8_array codeunits:$t start:i32 end:i32) + if expand($t) => array i8 + -> str:stringref +(string.new_wtf8_array codeunits:$t start:i32 end:i32) if expand($t) => array i8 -> str:stringref ``` -Create a new string from a subsequence of the *`codeunits`* WTF-8 bytes -in a GC-managed array, starting from offset *`start`* and continuing to -but not including *`end`*. If *`end`* is less than *`start`* or is -greater than the array length, trap. The bytes are decoded according to -`$wtf8_policy`, as in `string.new_wtf8`. The maximum value for -*`end`*–*`start`* is 231–1; passing a higher value traps. +Create a new string from a subsequence of the *`codeunits`* bytes in a +GC-managed array, starting from offset *`start`* and continuing to but +not including *`end`*. If *`end`* is less than *`start`* or is greater +than the array length, trap. The bytes are decoded in the same way as +`string.new_utf8`, `string.new_lossy_utf8`, and `string.new_wtf8`, +respectively. The maximum value for *`end`*–*`start`* is +231–1; passing a higher value traps. ``` (string.new_wtf16_array codeunits:$t start:i32 end:i32) @@ -566,7 +634,13 @@ for *`end`*–*`start`* is 230–1; passing a higher value traps. ``` -(string.encode_wtf8_array $wtf8_policy str:stringref array:$t start:i32) +(string.encode_utf8_array str:stringref array:$t start:i32) + if expand($t) => array (mut i8) + -> codeunits:i32 +(string.encode_lossy_utf8_array str:stringref array:$t start:i32) + if expand($t) => array (mut i8) + -> codeunits:i32 +(string.encode_wtf8_array str:stringref array:$t start:i32) if expand($t) => array (mut i8) -> codeunits:i32 (string.encode_wtf16_array str:stringref array:$t start:i32) @@ -581,9 +655,9 @@ same as the result of a the corresponding `string.measure_wtf8` or code units in the array, trap. Note that no `NUL` terminator is ever written. -For `string.encode_wtf8_array`, if an isolated surrogate is seen, the -behavior depends on the *`$wtf8_policy`* immediate, in the same way as -`string.encode_wtf8`. +For `string.encode_utf8_array`, trap if an isolated surrogate is seen. +For `string.encode_lossy_utf8_array`, replace isolated surrogates with +`U+FFFD`. ## Binary encoding @@ -594,39 +668,46 @@ reftype ::= ... | 0x62 ⇒ stringview_wtf16 ; SLEB128(-0x1e) | 0x61 ⇒ stringview_iter ; SLEB128(-0x1f) -wtf8_policy ::= 0x00 ⇒ utf8 - | 0x01 ⇒ wtf8 - | 0x02 ⇒ replace - instr ::= ... - | 0xfb 0x80:u32 $mem:u32 $policy:u32 ⇒ string.new_wtf8 $mem $policy - | 0xfb 0x81:u32 $mem:u32 ⇒ string.new_wtf16 $mem - | 0xfb 0x82:u32 $idx:u32 ⇒ string.const $idx - | 0xfb 0x84:u32 $policy:u32 ⇒ string.measure_wtf8 $policy - | 0xfb 0x85:u32 ⇒ string.measure_wtf16 - | 0xfb 0x86:u32 $mem:u32 $policy:u32 ⇒ string.encode_wtf8 $mem $policy - | 0xfb 0x87:u32 $mem:u32 ⇒ string.encode_wtf16 $mem - | 0xfb 0x88:u32 ⇒ string.concat - | 0xfb 0x89:u32 ⇒ string.eq - | 0xfb 0x8a:u32 ⇒ string.is_usv_sequence - | 0xfb 0x90:u32 ⇒ string.as_wtf8 - | 0xfb 0x91:u32 ⇒ stringview_wtf8.advance - | 0xfb 0x92:u32 $mem:u32 $policy:u32 ⇒ stringview_wtf8.encode $mem $policy - | 0xfb 0x93:u32 ⇒ stringview_wtf8.slice - | 0xfb 0x98:u32 ⇒ string.as_wtf16 - | 0xfb 0x99:u32 ⇒ stringview_wtf16.length - | 0xfb 0x9a:u32 ⇒ stringview_wtf16.get_codeunit - | 0xfb 0x9b:u32 $mem:u32 ⇒ stringview_wtf16.encode $mem - | 0xfb 0x9c:u32 ⇒ stringview_wtf16.slice - | 0xfb 0xa0:u32 ⇒ string.as_iter - | 0xfb 0xa1:u32 ⇒ stringview_iter.next - | 0xfb 0xa2:u32 ⇒ stringview_iter.advance - | 0xfb 0xa3:u32 ⇒ stringview_iter.rewind - | 0xfb 0xa4:u32 ⇒ stringview_iter.slice - | 0xfb 0xb0:u32 $policy:u32 [gc] ⇒ string.new_wtf8_array $policy - | 0xfb 0xb1:u32 [gc] ⇒ string.new_wtf16_array - | 0xfb 0xb2:u32 $policy:u32 [gc] ⇒ string.encode_wtf8_array $policy - | 0xfb 0xb3:u32 [gc] ⇒ string.encode_wtf16_array + | 0xfb 0xc0:u32 $mem:u32 ⇒ string.new_utf8 $mem + | 0xfb 0xc1:u32 $mem:u32 ⇒ string.new_lossy_utf8 $mem + | 0xfb 0xc2:u32 $mem:u32 ⇒ string.new_wtf8 $mem + | 0xfb 0x81:u32 $mem:u32 ⇒ string.new_wtf16 $mem + | 0xfb 0x82:u32 $idx:u32 ⇒ string.const $idx + | 0xfb 0xc3:u32 ⇒ string.measure_utf8 + | 0xfb 0xc4:u32 ⇒ string.measure_wtf8 + | 0xfb 0x85:u32 ⇒ string.measure_wtf16 + | 0xfb 0xc5:u32 $mem:u32 ⇒ string.encode_utf8 $mem + | 0xfb 0xc6:u32 $mem:u32 ⇒ string.encode_lossy_utf8 $mem + | 0xfb 0xc7:u32 $mem:u32 ⇒ string.encode_wtf8 $mem + | 0xfb 0x87:u32 $mem:u32 ⇒ string.encode_wtf16 $mem + | 0xfb 0x88:u32 ⇒ string.concat + | 0xfb 0x89:u32 ⇒ string.eq + | 0xfb 0x8a:u32 ⇒ string.is_usv_sequence + | 0xfb 0x90:u32 ⇒ string.as_wtf8 + | 0xfb 0x91:u32 ⇒ stringview_wtf8.advance + | 0xfb 0xd0:u32 $mem:u32 ⇒ stringview_wtf8.encode_utf8 $mem + | 0xfb 0xd1:u32 $mem:u32 ⇒ stringview_wtf8.encode_lossy_utf8 $mem + | 0xfb 0xd2:u32 $mem:u32 ⇒ stringview_wtf8.encode_wtf8 $mem + | 0xfb 0x93:u32 ⇒ stringview_wtf8.slice + | 0xfb 0x98:u32 ⇒ string.as_wtf16 + | 0xfb 0x99:u32 ⇒ stringview_wtf16.length + | 0xfb 0x9a:u32 ⇒ stringview_wtf16.get_codeunit + | 0xfb 0x9b:u32 $mem:u32 ⇒ stringview_wtf16.encode $mem + | 0xfb 0x9c:u32 ⇒ stringview_wtf16.slice + | 0xfb 0xa0:u32 ⇒ string.as_iter + | 0xfb 0xa1:u32 ⇒ stringview_iter.next + | 0xfb 0xa2:u32 ⇒ stringview_iter.advance + | 0xfb 0xa3:u32 ⇒ stringview_iter.rewind + | 0xfb 0xa4:u32 ⇒ stringview_iter.slice + | 0xfb 0xe0:u32 [gc] ⇒ string.new_utf8_array + | 0xfb 0xe1:u32 [gc] ⇒ string.new_lossy_utf8_array + | 0xfb 0xe2:u32 [gc] ⇒ string.new_wtf8_array + | 0xfb 0xb1:u32 [gc] ⇒ string.new_wtf16_array + | 0xfb 0xe3:u32 [gc] ⇒ string.encode_utf8_array + | 0xfb 0xe4:u32 [gc] ⇒ string.encode_lossy_utf8_array + | 0xfb 0xe5:u32 [gc] ⇒ string.encode_wtf8_array + | 0xfb 0xb3:u32 [gc] ⇒ string.encode_wtf16_array ;; New section. If present, must be present only once, and right before ;; the globals section (or where the globals section would be). Each @@ -652,13 +733,12 @@ operand allows you to elide the memory, in which case it defaults to 0. local.get $ptr local.get $ptr call $strlen - string.new_wtf8) + string.new_utf8) ``` -Generally speaking, this proposal only distinguishes between UTF-8 and -WTF-8 when encoding string contents to memory. As this is a a decode -operation, the proposal just has a WTF-8 interface, as WTF-8 is a -superset of UTF-8. +If the bytes being decoded aren't actually valid UTF-8, this function +will trap. Use `string.new_lossy_utf8` in contexts where replacing +invalid data with `U+FFFD` is a better strategy than trapping. ### Make string from an array of WTF-8 code units in memory @@ -669,6 +749,10 @@ superset of UTF-8. string.new_wtf8) ``` +Note that `string.new_wtf8` (and `string.new_wtf8_array`) are always +strict decoders: if the bytes are not valid WTF-8, the instruction +traps. + ### Make string from UTF-16 in memory ```wasm @@ -868,7 +952,7 @@ open to considering adding more instructions. (local $len i32) (local $ptr i32) local.get $str - string.measure_wtf8 utf8 + string.measure_utf8 local.set $len block $valid @@ -887,7 +971,7 @@ open to considering adding more instructions. local.get $str local.get $ptr - string.encode_wtf8 wtf8 ;; push bytes written, same as $len + string.encode_utf8 ;; push bytes written, same as $len local.get $ptr i32.add @@ -898,12 +982,17 @@ open to considering adding more instructions. return) ``` -Using `string.measure_wtf8 utf8` ensures that the encoded string is a -valid unicode scalar value sequence. How to handle invalid UTF-8 is up -to the user; instead of `unreachable` we could throw an exception. +Using `string.measure_utf8` ensures that the encoded string is a valid +unicode scalar value sequence. How to handle invalid UTF-8 is up to the +user; instead of `unreachable` we could throw an exception. + +Note that in this case, the subsequent `string.encode_utf8` could just +as well have been `string.encode_lossy_utf8` or `string.encode_wtf8`, as +these instructions are all the same for strings that do not contain +isolated surrogates, and we checked that there were none. If we meant to handle isolated surrogates, we could use -`string.measure_wtf8 wtf8` instead. +`string.measure_wtf8` instead. ### Stream over contents of string @@ -923,7 +1012,7 @@ will encode isolated surrogates as WTF-8. local.get $cursor global.get $buf i32.const 1024 - string.encode_wtf8 wtf8 ;; push bytes written + string.encode_wtf8 ;; push bytes written local.tee $bytes (if i32.eqz (then return)) ;; if no bytes encoded, done local.get $bytes From 610beb3d52b1a974f8bd077719a1d9c1279b1d87 Mon Sep 17 00:00:00 2001 From: Andy Wingo Date: Mon, 12 Sep 2022 15:53:26 +0200 Subject: [PATCH 14/15] Typo fix --- proposals/stringref/Overview.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/stringref/Overview.md b/proposals/stringref/Overview.md index 4863b4f..d6b76ed 100644 --- a/proposals/stringref/Overview.md +++ b/proposals/stringref/Overview.md @@ -215,7 +215,7 @@ Out-of-bounds access will trap. The maximum value for *`bytes`* is These three instructions decode the bytes in three different ways: - * `string.new_utf8`, decodes using a strict UTF-8 decoder. If the + * `string.new_utf8` decodes using a strict UTF-8 decoder. If the bytes are not valid UTF-8, trap. * `string.new_lossy_utf8` decodes using a sloppy UTF-8 decoder: all From cd97570867ed4c771f58873a50e1c808c7b145c0 Mon Sep 17 00:00:00 2001 From: Andy Wingo Date: Mon, 12 Sep 2022 16:29:24 +0200 Subject: [PATCH 15/15] Make binary encoding less chaotic :) --- proposals/stringref/Overview.md | 34 ++++++++++++++++----------------- 1 file changed, 17 insertions(+), 17 deletions(-) diff --git a/proposals/stringref/Overview.md b/proposals/stringref/Overview.md index d6b76ed..c26efbe 100644 --- a/proposals/stringref/Overview.md +++ b/proposals/stringref/Overview.md @@ -669,27 +669,27 @@ reftype ::= ... | 0x61 ⇒ stringview_iter ; SLEB128(-0x1f) instr ::= ... - | 0xfb 0xc0:u32 $mem:u32 ⇒ string.new_utf8 $mem - | 0xfb 0xc1:u32 $mem:u32 ⇒ string.new_lossy_utf8 $mem - | 0xfb 0xc2:u32 $mem:u32 ⇒ string.new_wtf8 $mem + | 0xfb 0x80:u32 $mem:u32 ⇒ string.new_utf8 $mem | 0xfb 0x81:u32 $mem:u32 ⇒ string.new_wtf16 $mem | 0xfb 0x82:u32 $idx:u32 ⇒ string.const $idx - | 0xfb 0xc3:u32 ⇒ string.measure_utf8 - | 0xfb 0xc4:u32 ⇒ string.measure_wtf8 + | 0xfb 0x83:u32 ⇒ string.measure_utf8 + | 0xfb 0x84:u32 ⇒ string.measure_wtf8 | 0xfb 0x85:u32 ⇒ string.measure_wtf16 - | 0xfb 0xc5:u32 $mem:u32 ⇒ string.encode_utf8 $mem - | 0xfb 0xc6:u32 $mem:u32 ⇒ string.encode_lossy_utf8 $mem - | 0xfb 0xc7:u32 $mem:u32 ⇒ string.encode_wtf8 $mem + | 0xfb 0x86:u32 $mem:u32 ⇒ string.encode_utf8 $mem | 0xfb 0x87:u32 $mem:u32 ⇒ string.encode_wtf16 $mem | 0xfb 0x88:u32 ⇒ string.concat | 0xfb 0x89:u32 ⇒ string.eq | 0xfb 0x8a:u32 ⇒ string.is_usv_sequence + | 0xfb 0x8b:u32 $mem:u32 ⇒ string.new_lossy_utf8 $mem + | 0xfb 0x8c:u32 $mem:u32 ⇒ string.new_wtf8 $mem + | 0xfb 0x8d:u32 $mem:u32 ⇒ string.encode_lossy_utf8 $mem + | 0xfb 0x8e:u32 $mem:u32 ⇒ string.encode_wtf8 $mem | 0xfb 0x90:u32 ⇒ string.as_wtf8 | 0xfb 0x91:u32 ⇒ stringview_wtf8.advance - | 0xfb 0xd0:u32 $mem:u32 ⇒ stringview_wtf8.encode_utf8 $mem - | 0xfb 0xd1:u32 $mem:u32 ⇒ stringview_wtf8.encode_lossy_utf8 $mem - | 0xfb 0xd2:u32 $mem:u32 ⇒ stringview_wtf8.encode_wtf8 $mem + | 0xfb 0x92:u32 $mem:u32 ⇒ stringview_wtf8.encode_utf8 $mem | 0xfb 0x93:u32 ⇒ stringview_wtf8.slice + | 0xfb 0x94:u32 $mem:u32 ⇒ stringview_wtf8.encode_lossy_utf8 $mem + | 0xfb 0x95:u32 $mem:u32 ⇒ stringview_wtf8.encode_wtf8 $mem | 0xfb 0x98:u32 ⇒ string.as_wtf16 | 0xfb 0x99:u32 ⇒ stringview_wtf16.length | 0xfb 0x9a:u32 ⇒ stringview_wtf16.get_codeunit @@ -700,14 +700,14 @@ instr ::= ... | 0xfb 0xa2:u32 ⇒ stringview_iter.advance | 0xfb 0xa3:u32 ⇒ stringview_iter.rewind | 0xfb 0xa4:u32 ⇒ stringview_iter.slice - | 0xfb 0xe0:u32 [gc] ⇒ string.new_utf8_array - | 0xfb 0xe1:u32 [gc] ⇒ string.new_lossy_utf8_array - | 0xfb 0xe2:u32 [gc] ⇒ string.new_wtf8_array + | 0xfb 0xb0:u32 [gc] ⇒ string.new_utf8_array | 0xfb 0xb1:u32 [gc] ⇒ string.new_wtf16_array - | 0xfb 0xe3:u32 [gc] ⇒ string.encode_utf8_array - | 0xfb 0xe4:u32 [gc] ⇒ string.encode_lossy_utf8_array - | 0xfb 0xe5:u32 [gc] ⇒ string.encode_wtf8_array + | 0xfb 0xb2:u32 [gc] ⇒ string.encode_utf8_array | 0xfb 0xb3:u32 [gc] ⇒ string.encode_wtf16_array + | 0xfb 0xb4:u32 [gc] ⇒ string.new_lossy_utf8_array + | 0xfb 0xb5:u32 [gc] ⇒ string.new_wtf8_array + | 0xfb 0xb6:u32 [gc] ⇒ string.encode_lossy_utf8_array + | 0xfb 0xb7:u32 [gc] ⇒ string.encode_wtf8_array ;; New section. If present, must be present only once, and right before ;; the globals section (or where the globals section would be). Each