From 4db1de546f7882ccc092d9ee749338590ad7c3fc Mon Sep 17 00:00:00 2001
From: dcode <dcode@dcode.io>
Date: Fri, 1 Jul 2022 16:30:53 +0200
Subject: [PATCH 01/15] Clarify string literal section kind

---
 proposals/stringref/Overview.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/proposals/stringref/Overview.md b/proposals/stringref/Overview.md
index 29edffa..a0a1481 100644
--- a/proposals/stringref/Overview.md
+++ b/proposals/stringref/Overview.md
@@ -252,7 +252,7 @@ be used in global variable initializers.
 #### String literal section
 
 The `string.const` section indicates the literal as an `i32` index into
-a new custom section: a string table, encoded as a `vec(vec(u8))` of
+a new regular section: a string table, encoded as a `vec(vec(u8))` of
 valid WTF-8 strings.  Because literal strings can contain codepoint 0,
 strings in the string table do not use NUL as a terminator. The string
 table section must immediately precede the global section, or where the

From c4bfe42f57023110647f117d6940b8c7771452a1 Mon Sep 17 00:00:00 2001
From: dcode <dcode@dcode.io>
Date: Fri, 1 Jul 2022 17:19:03 +0200
Subject: [PATCH 02/15] Clarify string literal requirement

---
 proposals/stringref/Overview.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/proposals/stringref/Overview.md b/proposals/stringref/Overview.md
index 29edffa..7ed8c7c 100644
--- a/proposals/stringref/Overview.md
+++ b/proposals/stringref/Overview.md
@@ -25,7 +25,7 @@ find good compromises are "minimal" and "viable".
  3. Allow WebAssembly implementations to efficiently represent strings
     internally in either WTF-8 or WTF-16 encodings
  4. Allow access to WTF-16 code units for Java, Dart, Kotlin and similar languages
- 5. Allow string literals in element sections
+ 5. Allow string literals as constant expressions
 
 ## Definitions
  - *codepoint*: An integer in the range [0,0x10FFFF].

From 727dca6880c7847ec77a1ee37beea3ce565b99a6 Mon Sep 17 00:00:00 2001
From: Andy Wingo <wingo@igalia.com>
Date: Wed, 6 Jul 2022 08:43:47 +0200
Subject: [PATCH 03/15] Integrate with GC proposal

Add instructions to encode and decode string contents to and from
GC-managed arrays of UTF-8/WTF-8 and WTF-16.

Fixes #1.
---
 proposals/stringref/Overview.md | 66 +++++++++++++++++++++++++++++++++
 1 file changed, 66 insertions(+)

diff --git a/proposals/stringref/Overview.md b/proposals/stringref/Overview.md
index 39b00f5..326bb08 100644
--- a/proposals/stringref/Overview.md
+++ b/proposals/stringref/Overview.md
@@ -504,6 +504,68 @@ Return the number of codepoints that were actually consumed.
 Return a substring of *`view`*, starting at the current position of
 *`view`* and continuing for at most *`codepoints`* codepoints.
 
+### GC integration
+
+Though this proposal does not have a dependency on the [GC
+proposal](https://github.com/WebAssembly/gc/blob/master/proposals/gc/MVP.md),
+compiler authors that target GC will likely want to be able to encode
+the contents of a stringref to a GC array, and vice versa.
+
+Our primary use cases are:
+
+ 1. String-builder interfaces, which will likely use a WTF-8 or WTF-16
+    array as intermediate storage, depending on the language being
+    compiled.  We will need to be able to create strings from arrays.
+    When the string contents are ready, we will almost always decode
+    from array offset 0 and continue to some offset before the end of
+    the array.  We'll also need to be able to append a string's contents
+    to an array at a given offset.
+ 2. Communicating strings with another process, possibly over the
+    network.  Here, UTF-8 and WTF-8 are the important encodings, and we
+    need to be able to read and write to arbitrary slices of arrays.
+
+```
+(string.new_wtf8_array $wtf8_policy codeunits:$t start:i32 end:i32)
+  if expand($t) => array i8
+  -> str:stringref
+```
+Create a new string from a subsequence of the *`codeunits`* WTF-8 bytes
+in a GC-managed array, starting from offset *`start`* and continuing to
+but not including *`end`*.  If *`end`* is less than *`start`* or is
+greater than the array length, trap.  The bytes are decoded according to
+`$wtf8_policy`, as in `string.new_wtf8`.  The maximum value for
+*`end`*–*`start`* is 2<sup>31</sup>–1; passing a higher value traps.
+
+```
+(string.new_wtf16_array codeunits:$t start:i32 end:i32)
+  if expand($t) => array i16
+  -> str:stringref
+```
+Create a new string from a subsequence of the *`codeunits`* WTF-16 code
+units in a GC-managed array, starting from offset *`start`* and
+continuing to but not including *`end`*.  If *`end`* is less than
+*`start`* or is greater than the array length, trap.  The maximum value
+for *`end`*–*`start`* is 2<sup>30</sup>–1; passing a higher value
+traps.
+
+```
+(string.encode_wtf8_array $wtf8_policy str:stringref array:$t start:i32)
+  if expand($t) => array (mut i8)
+(string.encode_wtf16_array str:stringref array:$t start:i32)
+  if expand($t) => array (mut i16)
+```
+Encode the contents of the string *`str`* as WTF-8 or WTF-16,
+respectively, to the GC-managed array *`array`*, starting at offset
+*`start`*.  The number of code units written will be the same as
+returned by the corresponding `string.measure_wtf8` or
+`string.measure_wtf16`, respectively.  If there is not space for the
+code units in the array, trap.  Note that no `NUL` terminator is ever
+written.
+
+For `string.encode_wtf8_array`, if an isolated surrogate is seen, the
+behavior depends on the *`$wtf8_policy`* immediate, in the same way as
+`string.encode_wtf8`.
+
 ## Binary encoding
 
 ```
@@ -542,6 +604,10 @@ instr ::= ...
        |  0xfb 0xa2                       ⇒ stringview_iter.advance
        |  0xfb 0xa3                       ⇒ stringview_iter.rewind
        |  0xfb 0xa4                       ⇒ stringview_iter.slice
+       |  0xfb 0xb0 $policy:u32      [gc] ⇒ string.new_wtf8_array $policy
+       |  0xfb 0xb1                  [gc] ⇒ string.new_wtf16_array
+       |  0xfb 0xb2 $policy:u32      [gc] ⇒ string.encode_wtf8_array $policy
+       |  0xfb 0xb3                  [gc] ⇒ string.encode_wtf16_array
 
 ;; New section.  If present, must be present only once, and right before
 ;; the globals section (or where the globals section would be).  Each

From 1558965373710007194baa11ab90ff3978f2bf22 Mon Sep 17 00:00:00 2001
From: Andy Wingo <wingo@igalia.com>
Date: Wed, 6 Jul 2022 08:49:38 +0200
Subject: [PATCH 04/15] Change stringview_iter.cur to stringview_iter.next

Seems more useful in the general case, without loss of expressivity.
Fixes #4.
---
 proposals/stringref/Overview.md | 16 ++++++----------
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/proposals/stringref/Overview.md b/proposals/stringref/Overview.md
index 39b00f5..0ec249a 100644
--- a/proposals/stringref/Overview.md
+++ b/proposals/stringref/Overview.md
@@ -477,11 +477,12 @@ stringview can then be used to iterate over the codepoints of the
 string.
 
 ```
-(stringview_iter.cur view:stringview_iter)
+(stringview_iter.next view:stringview_iter)
   -> codepoint:i32
 ```
-Return the codepoint currently pointed to by *`view`*, or -1 if the
-iterator is at the end of the string.
+If *`view`* is already at the end of the string, return -1.  Otherwise
+return the codepoint currently pointed to by the iterator, and advance
+the iterator's position by one codepoint.
 
 ```
 (stringview_iter.advance view:stringview_iter codepoints:i32)
@@ -538,7 +539,7 @@ instr ::= ...
        |  0xfb 0x9b $mem:u32              ⇒ stringview_wtf16.encode $mem
        |  0xfb 0x9c                       ⇒ stringview_wtf16.slice
        |  0xfb 0xa0                       ⇒ string.as_iter
-       |  0xfb 0xa1                       ⇒ stringview_iter.cur
+       |  0xfb 0xa1                       ⇒ stringview_iter.next
        |  0xfb 0xa2                       ⇒ stringview_iter.advance
        |  0xfb 0xa3                       ⇒ stringview_iter.rewind
        |  0xfb 0xa4                       ⇒ stringview_iter.slice
@@ -908,7 +909,7 @@ WTF-8 in memory, for longer strings.
   block $done
     loop $loop
       local.get $iter
-      stringview_iter.cur
+      stringview_iter.next
       local.tee $ch
 
       i32.const -1
@@ -917,11 +918,6 @@ WTF-8 in memory, for longer strings.
 
       local.get $ch
       call $have-codepoint
-
-      local.get $iter
-      i32.const 1
-      string.advance_wtf8
-      drop
     end
   end)
 ```

From d04c30931cdb44d8dc8fb05bb4e9210cfab8f95f Mon Sep 17 00:00:00 2001
From: Andy Wingo <wingo@igalia.com>
Date: Wed, 6 Jul 2022 08:58:23 +0200
Subject: [PATCH 05/15] Further WTF-16 byte order clarification

Fixes #15.
---
 proposals/stringref/Overview.md | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/proposals/stringref/Overview.md b/proposals/stringref/Overview.md
index 39b00f5..1647a82 100644
--- a/proposals/stringref/Overview.md
+++ b/proposals/stringref/Overview.md
@@ -228,7 +228,9 @@ Unicode standard, version 14.0.0, page 126.
 Create a new string from the *`codeunits`* code units encoded in memory at
 *`ptr`*.  Out-of-bounds access will trap.  *`ptr`* must be two-byte
 aligned, and will trap otherwise.  The maximum value for *`codeunits`*
-is 2<sup>30</sup>–1; passing a higher value traps.
+is 2<sup>30</sup>–1; passing a higher value traps.  Each code unit is
+read from memory as if with `i32.load16`, and are therefore decoded
+using little-endian byte order.
 
 #### `string.new` size limits
 

From fc901e78ceb6b52990b53a0d6d64cb11b8dc3566 Mon Sep 17 00:00:00 2001
From: Andy Wingo <wingo@igalia.com>
Date: Wed, 6 Jul 2022 09:05:15 +0200
Subject: [PATCH 06/15] String slice operands have exclusive end

Fixes #17.
---
 proposals/stringref/Overview.md | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/proposals/stringref/Overview.md b/proposals/stringref/Overview.md
index 39b00f5..8f9cd0c 100644
--- a/proposals/stringref/Overview.md
+++ b/proposals/stringref/Overview.md
@@ -418,8 +418,9 @@ If an isolated surrogate is seen, the behavior determines on the
   -> str:stringref
 ```
 Return a substring of *`view`*, for the WTF-8 bytes starting at offset
-*`start`* and not greater than *`end`*.  *`start`* and *`end`* receive
-the "WTF-8 position treatment", as for `stringview_wtf8.advance`.
+*`start`* and continuing to but not including *`end`*.  *`start`* and
+*`end`* receive the "WTF-8 position treatment", as for
+`stringview_wtf8.advance`.
 
 ### `stringview_wtf16`
 
@@ -462,8 +463,9 @@ transformation is the "WTF-16 position treatment".
   -> str:stringref
 ```
 Return a substring of *`view`*, for the WTF-16 code units starting at offset
-*`start`* and not greater than *`end`*.  *`start`* and *`end`* receive
-the "WTF-16 position treatment", as for `stringview_wtf16.encode`.
+*`start`* and continuing to but not including *`end`*.  *`start`* and
+*`end`* receive the "WTF-16 position treatment", as for
+`stringview_wtf16.encode`.
 
 ### `stringview_iter`
 
@@ -638,7 +640,9 @@ rather it just deals in WTF-16, as most source languages that expose
   local.get $str
   string.as_wtf16
   local.get $offset
+  local.get $offset
   local.get $codeunits
+  i32.add
   stringview_wtf16.slice)
 ```
 

From d62a253762dd6f525cd2a5f3f88172a01d0ec1dd Mon Sep 17 00:00:00 2001
From: Andy Wingo <wingo@igalia.com>
Date: Wed, 6 Jul 2022 09:12:46 +0200
Subject: [PATCH 07/15] Allow null operands to string.eq

Fixes #18.
---
 proposals/stringref/Overview.md | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/proposals/stringref/Overview.md b/proposals/stringref/Overview.md
index 39b00f5..5a00935 100644
--- a/proposals/stringref/Overview.md
+++ b/proposals/stringref/Overview.md
@@ -195,6 +195,10 @@ address ::= i32 | i64
 Such instructions also take the memory to which to read or write as an
 immediate.
 
+Although `stringref` is a nullable type, trap if a null `stringref`
+value reaches any instruction in this proposal.  The one exception is
+`string.eq`.
+
 ### Creating strings
 
 ```
@@ -344,8 +348,9 @@ If an allocation fails, the implementation must trap.  Fallible
 ```
 (string.eq a:stringref b:stringref) -> i32
 ```
-Return 1 if the strings *`a`* and *`b`* contain the same codepoint
-sequence.  Return 0 otherwise.
+If both *`a`* and *`b`* are null, return 1.  If only one of them is
+null, return 0.  Otherwise return 1 if the strings *`a`* and *`b`*
+contain the same codepoint sequence, or 0 otherwise.
 
 ```
 (string.is_usv_sequence str:stringref)

From f9257447c357190974e3eb979939a9838a79e139 Mon Sep 17 00:00:00 2001
From: Andy Wingo <wingo@igalia.com>
Date: Wed, 6 Jul 2022 09:17:45 +0200
Subject: [PATCH 08/15] string.encode_wtf8, string.encode_wtf16 return encoded
 length

Interestingly, one of the examples already assumed this behavior.

Fixes #24.
---
 proposals/stringref/Overview.md | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/proposals/stringref/Overview.md b/proposals/stringref/Overview.md
index 39b00f5..3fe8c73 100644
--- a/proposals/stringref/Overview.md
+++ b/proposals/stringref/Overview.md
@@ -283,9 +283,9 @@ number of code units represent these sizes as unsigned values.
 
 ```
 (string.measure_wtf8 $wtf8_policy str:stringref)
-  -> bytes:i32
+  -> codeunits:i32
 (string.measure_wtf16 str:stringref)
-  -> bytes:i32
+  -> codeunits:i32
 ```
 Measure the number of code units that would be required to encode the
 contents of the string *`str`* to WTF-8 or WTF-16 respectively.
@@ -299,11 +299,13 @@ require more code units than the limit, the result is -1.
 
 ```
 (string.encode_wtf8 $memory $wtf8_policy str:stringref ptr:address)
+  -> codeunits:i32
 (string.encode_wtf16 $memory str:stringref ptr:address)
+  -> codeunits:i32
 ```
 Encode the contents of the string *`str`* as UTF-8, WTF-8, or WTF-16,
-respectively, to memory at *`ptr`*.  The number of code units written
-will be the same as returned by the corresponding
+respectively, to memory at *`ptr`*.  Return the number of code units
+written, which will be the same as returned by the corresponding
 `string.measure_*encoding*`.
 
 Each code unit is written to memory as if stored by `i32.store8` or
@@ -554,8 +556,8 @@ stringrefs ::= section_14(0x00 vec(vec(u8)))
 
 ## Examples
 
-We assume that the textual syntax for `string.encode` and `string.new`
-allows you to elide the memory, in which case it defaults to 0.
+We assume that the textual syntax for instructions that take a memory
+operand allows you to elide the memory, in which case it defaults to 0.
 
 ### Make string from NUL-terminated UTF-8 in memory
 
@@ -797,10 +799,9 @@ open to considering adding more instructions.
 
   local.get $str
   local.get $ptr
-  string.encode_wtf8 wtf8
+  string.encode_wtf8 wtf8          ;; push bytes written, same as $len
 
   local.get $ptr
-  local.get $len
   i32.add
   i32.const 0
   i32.store8                       ;; write NUL

From ead9d47b59d17ebefc6ce01f2fc43913a4131527 Mon Sep 17 00:00:00 2001
From: Andy Wingo <wingo@igalia.com>
Date: Wed, 6 Jul 2022 11:12:31 +0200
Subject: [PATCH 09/15] Fix grammar

---
 proposals/stringref/Overview.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/proposals/stringref/Overview.md b/proposals/stringref/Overview.md
index 1647a82..3d50f6a 100644
--- a/proposals/stringref/Overview.md
+++ b/proposals/stringref/Overview.md
@@ -229,7 +229,7 @@ Create a new string from the *`codeunits`* code units encoded in memory at
 *`ptr`*.  Out-of-bounds access will trap.  *`ptr`* must be two-byte
 aligned, and will trap otherwise.  The maximum value for *`codeunits`*
 is 2<sup>30</sup>–1; passing a higher value traps.  Each code unit is
-read from memory as if with `i32.load16`, and are therefore decoded
+read from memory as if with `i32.load16`, and is therefore decoded
 using little-endian byte order.
 
 #### `string.new` size limits

From 6d988b49c6aca2a007beb5a6770242b4a87215ee Mon Sep 17 00:00:00 2001
From: Andy Wingo <wingo@igalia.com>
Date: Wed, 6 Jul 2022 11:18:13 +0200
Subject: [PATCH 10/15] Address feedback; encode returns codeunits written

See #33.
---
 proposals/stringref/Overview.md | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/proposals/stringref/Overview.md b/proposals/stringref/Overview.md
index 326bb08..57244b8 100644
--- a/proposals/stringref/Overview.md
+++ b/proposals/stringref/Overview.md
@@ -511,7 +511,7 @@ proposal](https://github.com/WebAssembly/gc/blob/master/proposals/gc/MVP.md),
 compiler authors that target GC will likely want to be able to encode
 the contents of a stringref to a GC array, and vice versa.
 
-Our primary use cases are:
+The primary use cases are:
 
  1. String-builder interfaces, which will likely use a WTF-8 or WTF-16
     array as intermediate storage, depending on the language being
@@ -524,6 +524,9 @@ Our primary use cases are:
     network.  Here, UTF-8 and WTF-8 are the important encodings, and we
     need to be able to read and write to arbitrary slices of arrays.
 
+The instructions below shall be available in WebAssembly implementations
+that support both GC and stringrefs.
+
 ```
 (string.new_wtf8_array $wtf8_policy codeunits:$t start:i32 end:i32)
   if expand($t) => array i8
@@ -551,13 +554,15 @@ traps.
 ```
 (string.encode_wtf8_array $wtf8_policy str:stringref array:$t start:i32)
   if expand($t) => array (mut i8)
+  -> codeunits:i32
 (string.encode_wtf16_array str:stringref array:$t start:i32)
   if expand($t) => array (mut i16)
+  -> codeunits:i32
 ```
 Encode the contents of the string *`str`* as WTF-8 or WTF-16,
 respectively, to the GC-managed array *`array`*, starting at offset
-*`start`*.  The number of code units written will be the same as
-returned by the corresponding `string.measure_wtf8` or
+*`start`*.  Return the number of code units written, which will be the
+same as the result of a the corresponding `string.measure_wtf8` or
 `string.measure_wtf16`, respectively.  If there is not space for the
 code units in the array, trap.  Note that no `NUL` terminator is ever
 written.

From 8bd7487bce963217c3d69594f8d34a763e746b05 Mon Sep 17 00:00:00 2001
From: Andy Wingo <wingo@igalia.com>
Date: Thu, 7 Jul 2022 08:59:33 +0200
Subject: [PATCH 11/15] stringview_wtf16.encode returns number of written code
 units

Since the operand is a maximum number of code units to write, it makes
sense to return the number of code units actually written, for the same
reason as string.encode_wtf16.

See #24.
---
 proposals/stringref/Overview.md | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/proposals/stringref/Overview.md b/proposals/stringref/Overview.md
index ee81d85..0aa4401 100644
--- a/proposals/stringref/Overview.md
+++ b/proposals/stringref/Overview.md
@@ -458,10 +458,12 @@ Return the 16-bit code unit at offset *`pos`* in the WTF-16 encoding of
 
 ```
 (stringview_wtf16.encode $memory view:stringview_wtf16 ptr:address pos:i32 len:i32)
+  -> codeunits:i32
 ```
 Write a subsequence of the WTF-16 encoding of *`view`* to memory at
 *`ptr`*, starting at the WTF-16 offset *`pos`*, writing no more than
 *`len`* 16-bit code units.  If *`ptr`* is not two-byte aligned, trap.
+Return the number of code units written.
 
 If *`pos`* is greater than the number of WTF-16 code units in *`view`*,
 it is as if it were instead given as the code unit length.  This

From 84e20792d2a975a31b62ca03d143d775153b5c48 Mon Sep 17 00:00:00 2001
From: Andy Wingo <wingo@igalia.com>
Date: Mon, 12 Sep 2022 14:51:17 +0200
Subject: [PATCH 12/15] Binary encoding of opcodes after 0xfb prefix are LEBs

Fixes #9.
---
 proposals/stringref/Overview.md | 59 +++++++++++++++++----------------
 1 file changed, 31 insertions(+), 28 deletions(-)

diff --git a/proposals/stringref/Overview.md b/proposals/stringref/Overview.md
index 0aa4401..14d7e51 100644
--- a/proposals/stringref/Overview.md
+++ b/proposals/stringref/Overview.md
@@ -599,34 +599,34 @@ wtf8_policy ::= 0x00 ⇒ utf8
              |  0x02 ⇒ replace
 
 instr ::= ...
-       |  0xfb 0x80 $mem:u32 $policy:u32  ⇒ string.new_wtf8 $mem $policy
-       |  0xfb 0x81 $mem:u32              ⇒ string.new_wtf16 $mem
-       |  0xfb 0x82 $idx:u32              ⇒ string.const $idx
-       |  0xfb 0x84 $policy:u32           ⇒ string.measure_wtf8 $policy
-       |  0xfb 0x85                       ⇒ string.measure_wtf16
-       |  0xfb 0x86 $mem:u32 $policy:u32  ⇒ string.encode_wtf8 $mem $policy
-       |  0xfb 0x87 $mem:u32              ⇒ string.encode_wtf16 $mem
-       |  0xfb 0x88                       ⇒ string.concat
-       |  0xfb 0x89                       ⇒ string.eq
-       |  0xfb 0x8a                       ⇒ string.is_usv_sequence
-       |  0xfb 0x90                       ⇒ string.as_wtf8
-       |  0xfb 0x91                       ⇒ stringview_wtf8.advance
-       |  0xfb 0x92 $mem:u32 $policy:u32  ⇒ stringview_wtf8.encode $mem $policy
-       |  0xfb 0x93                       ⇒ stringview_wtf8.slice
-       |  0xfb 0x98                       ⇒ string.as_wtf16
-       |  0xfb 0x99                       ⇒ stringview_wtf16.length
-       |  0xfb 0x9a                       ⇒ stringview_wtf16.get_codeunit
-       |  0xfb 0x9b $mem:u32              ⇒ stringview_wtf16.encode $mem
-       |  0xfb 0x9c                       ⇒ stringview_wtf16.slice
-       |  0xfb 0xa0                       ⇒ string.as_iter
-       |  0xfb 0xa1                       ⇒ stringview_iter.next
-       |  0xfb 0xa2                       ⇒ stringview_iter.advance
-       |  0xfb 0xa3                       ⇒ stringview_iter.rewind
-       |  0xfb 0xa4                       ⇒ stringview_iter.slice
-       |  0xfb 0xb0 $policy:u32      [gc] ⇒ string.new_wtf8_array $policy
-       |  0xfb 0xb1                  [gc] ⇒ string.new_wtf16_array
-       |  0xfb 0xb2 $policy:u32      [gc] ⇒ string.encode_wtf8_array $policy
-       |  0xfb 0xb3                  [gc] ⇒ string.encode_wtf16_array
+       |  0xfb 0x80:u32 $mem:u32 $policy:u32  ⇒ string.new_wtf8 $mem $policy
+       |  0xfb 0x81:u32 $mem:u32              ⇒ string.new_wtf16 $mem
+       |  0xfb 0x82:u32 $idx:u32              ⇒ string.const $idx
+       |  0xfb 0x84:u32 $policy:u32           ⇒ string.measure_wtf8 $policy
+       |  0xfb 0x85:u32                       ⇒ string.measure_wtf16
+       |  0xfb 0x86:u32 $mem:u32 $policy:u32  ⇒ string.encode_wtf8 $mem $policy
+       |  0xfb 0x87:u32 $mem:u32              ⇒ string.encode_wtf16 $mem
+       |  0xfb 0x88:u32                       ⇒ string.concat
+       |  0xfb 0x89:u32                       ⇒ string.eq
+       |  0xfb 0x8a:u32                       ⇒ string.is_usv_sequence
+       |  0xfb 0x90:u32                       ⇒ string.as_wtf8
+       |  0xfb 0x91:u32                       ⇒ stringview_wtf8.advance
+       |  0xfb 0x92:u32 $mem:u32 $policy:u32  ⇒ stringview_wtf8.encode $mem $policy
+       |  0xfb 0x93:u32                       ⇒ stringview_wtf8.slice
+       |  0xfb 0x98:u32                       ⇒ string.as_wtf16
+       |  0xfb 0x99:u32                       ⇒ stringview_wtf16.length
+       |  0xfb 0x9a:u32                       ⇒ stringview_wtf16.get_codeunit
+       |  0xfb 0x9b:u32 $mem:u32              ⇒ stringview_wtf16.encode $mem
+       |  0xfb 0x9c:u32                       ⇒ stringview_wtf16.slice
+       |  0xfb 0xa0:u32                       ⇒ string.as_iter
+       |  0xfb 0xa1:u32                       ⇒ stringview_iter.next
+       |  0xfb 0xa2:u32                       ⇒ stringview_iter.advance
+       |  0xfb 0xa3:u32                       ⇒ stringview_iter.rewind
+       |  0xfb 0xa4:u32                       ⇒ stringview_iter.slice
+       |  0xfb 0xb0:u32 $policy:u32      [gc] ⇒ string.new_wtf8_array $policy
+       |  0xfb 0xb1:u32                  [gc] ⇒ string.new_wtf16_array
+       |  0xfb 0xb2:u32 $policy:u32      [gc] ⇒ string.encode_wtf8_array $policy
+       |  0xfb 0xb3:u32                  [gc] ⇒ string.encode_wtf16_array
 
 ;; New section.  If present, must be present only once, and right before
 ;; the globals section (or where the globals section would be).  Each
@@ -637,6 +637,9 @@ instr ::= ...
 stringrefs ::= section_14(0x00 vec(vec(u8)))
 ```
 
+Note that the u32 (uleb) encoding for the opcode after the `0xfb` prefix
+takes two bytes, for opcode values between 0x80 and 0x3fff.
+
 ## Examples
 
 We assume that the textual syntax for instructions that take a memory

From a1d061549fd5fa3a03b7c29fc0686fe1709aaf37 Mon Sep 17 00:00:00 2001
From: Andy Wingo <wingo@igalia.com>
Date: Mon, 12 Sep 2022 15:44:56 +0200
Subject: [PATCH 13/15] Fold WTF-8 policy into instructions

Instead of having e.g. `string.encode_wtf8` taking a policy immediate,
just make a separate instruction for each policy.  Fixes #35.
---
 proposals/stringref/Overview.md | 283 +++++++++++++++++++++-----------
 1 file changed, 186 insertions(+), 97 deletions(-)

diff --git a/proposals/stringref/Overview.md b/proposals/stringref/Overview.md
index 14d7e51..4863b4f 100644
--- a/proposals/stringref/Overview.md
+++ b/proposals/stringref/Overview.md
@@ -202,28 +202,32 @@ value reaches any instruction in this proposal.  The one exception is
 ### Creating strings
 
 ```
-wtf8_policy ::= 'utf8' | 'wtf8' | 'replace'
-(string.new_wtf8 $memory $wtf8_policy ptr:address bytes:i32)
+(string.new_utf8 $memory ptr:address bytes:i32)
+  -> str:stringref
+(string.new_lossy_utf8 $memory ptr:address bytes:i32)
+  -> str:stringref
+(string.new_wtf8 $memory ptr:address bytes:i32)
   -> str:stringref
 ```
-Create a new string from the *`bytes`* WTF-8 bytes in memory at *`ptr`*.
+Create a new string from the *`bytes`* bytes in memory at *`ptr`*.
 Out-of-bounds access will trap.  The maximum value for *`bytes`* is
 2<sup>31</sup>–1; passing a higher value traps.
 
-The precise decoding semantics depend on the *`$wtf8_policy`* immediate.
+These three instructions decode the bytes in three different ways:
 
-For `utf8`, the bytes are decoded using a strict UTF-8 decoder.  If the
-bytes are not valid UTF-8, trap.
+ * `string.new_utf8`, decodes using a strict UTF-8 decoder.  If the
+    bytes are not valid UTF-8, trap.
 
-For `wtf8`, the bytes are decoded using a strict WTF-8 decoder, which is
-like UTF-8 but also allows isolated surrogates.  If the bytes are not
-valid WTF-8, trap.
+ * `string.new_lossy_utf8` decodes using a sloppy UTF-8 decoder: all
+   maximal subparts of an invalid subsequence are decoded as if they
+   were `U+FFFD` (the replacement character) instead.  This instruction
+   will never trap due to a decoding error.  See the section entitled
+   "U+FFFD Substitution of Maximal Subparts" in the Unicode standard,
+   version 14.0.0, page 126.
 
-For `replace`, all maximal subparts of an invalid subsequence are
-decoded as if they were `U+FFFD` (the replacement character) instead.
-The `replace` policy will never trap due to a decoding error.  See the
-section entitled "U+FFFD Substitution of Maximal Subparts" in the
-Unicode standard, version 14.0.0, page 126.
+ * `string.new_wtf8` decodes using a strict WTF-8 decoder, which is like
+   UTF-8 but also allows isolated surrogates.  If the bytes are not
+   valid WTF-8, trap.
 
 ```
 (string.new_wtf16 $memory ptr:address codeunits:i32)
@@ -277,8 +281,8 @@ string literal section as a future extension.
 
 The maximum size for the WTF-8 encoding of an individual string literal
 is 2<sup>31</sup>–1 bytes.  Embeddings may impose their own limits which
-are more restricted.  But similarly to `string.new`, instantiating a
-module with string literals may fail due to lack of memory resources,
+are more restricted.  But similarly to `string.new_wtf8`, instantiating
+a module with string literals may fail due to lack of memory resources,
 even if the string size is formally within the limits.  However
 `string.const` itself never traps when passed a valid literal offset.
 
@@ -288,46 +292,96 @@ All parameters and return values measuring a number of codepoints or a
 number of code units represent these sizes as unsigned values.
 
 ```
-(string.measure_wtf8 $wtf8_policy str:stringref)
+(string.measure_utf8 str:stringref)
   -> codeunits:i32
-(string.measure_wtf16 str:stringref)
+```
+Measure the number of code units (bytes) that would be required to
+encode the contents of the string *`str`* to UTF-8.  If the string
+contains an isolated surrogate, return -1.
+
+The maximum number of code units returned by `string.measure_utf8` is is
+2<sup>31</sup>-1.  If an encoding would require more code units than the
+limit, the result is -1.
+
+```
+(string.measure_wtf8 str:stringref)
   -> codeunits:i32
 ```
-Measure the number of code units that would be required to encode the
-contents of the string *`str`* to WTF-8 or WTF-16 respectively.
-For `string.measure_wtf8` with the `utf8` policy, if the string contains
-an isolated surrogate, return -1.
+Measure the number of code units (bytes) that would be required to
+encode the codepoints of the string *`str`* to WTF-8.
+
+Note that this instruction also serves to measure an encoding length for
+UTF-8 when isolated surrogates are replaced with `U+FFFD` ("lossy
+UTF-8"); the same number of bytes is required to encode `U+FFFD` as
+would be required to encode an isolated surrogate to WTF-8.
 
 The maximum number of code units returned by `string.measure_wtf8` is is
-2<sup>31</sup>-1.  The maximum number of code units returned by
-`string.measure_wtf16` is is 2<sup>30</sup>-1.  If an encoding would
-require more code units than the limit, the result is -1.
+2<sup>31</sup>-1.  If an encoding would require more code units than the
+limit, the result is -1.
 
 ```
-(string.encode_wtf8 $memory $wtf8_policy str:stringref ptr:address)
+(string.measure_wtf16 str:stringref)
   -> codeunits:i32
-(string.encode_wtf16 $memory str:stringref ptr:address)
+```
+Measure the number of code units that would be required to encode the
+contents of the string *`str`* to WTF-16.
+
+The maximum number of code units returned by `string.measure_wtf16` is
+is 2<sup>30</sup>-1.  If an encoding would require more code units than
+the limit, the result is -1.
+
+```
+(string.encode_utf8 $memory str:stringref ptr:address)
   -> codeunits:i32
 ```
-Encode the contents of the string *`str`* as UTF-8, WTF-8, or WTF-16,
-respectively, to memory at *`ptr`*.  Return the number of code units
+Encode the contents of the string *`str`* as UTF-8 to memory at *ptr*.
+If an isolated surrogate is seen, trap.  Return the number of code units
 written, which will be the same as returned by the corresponding
-`string.measure_*encoding*`.
+`string.measure_utf8`.
+
+The maximum number of bytes that can be encoded at once by
+`string.encode` is 2<sup>31</sup>-1.  If an encoding would require more
+bytes, it is as if the codepoints can't be encoded (a trap).
+
+```
+(string.encode_lossy_utf8 $memory str:stringref ptr:address)
+  -> codeunits:i32
+```
+Encode the contents of the string *`str`* as UTF-8 to memory at *`ptr`*.
+If an isolated surrogate is seen, encode `U+FFFD` (the replacement
+character) instead.  Return the number of code units written, which will
+be the same as returned by the corresponding `string.measure_wtf8`.
+
+The maximum number of bytes that can be encoded at once by
+`string.encode` is 2<sup>31</sup>-1.  If an encoding would require more
+bytes, it is as if the codepoints can't be encoded (a trap).
 
-Each code unit is written to memory as if stored by `i32.store8` or
-`i32.store16`, respectively, so WTF-16 code units are in little-endian
-byte order.
+```
+(string.encode_wtf8 $memory str:stringref ptr:address)
+  -> codeunits:i32
+```
+Encode the contents of the string *`str`* as WTF-8 to memory at *`ptr`*.
+Return the number of code units written, which will be the same as
+returned by the corresponding `string.measure_wtf8`.
 
 The maximum number of bytes that can be encoded at once by
 `string.encode` is 2<sup>31</sup>-1.  If an encoding would require more
 bytes, it is as if the codepoints can't be encoded (a trap).
 
-For `string.encode_wtf8`, if an isolated surrogate is seen, the behavior
-determines on the *`$wtf8_policy`* immediate.  For `utf8`, trap.  For
-`wtf8`, the surrogate is encoded as per WTF-8.  For `replace`, `U+FFFD`
-(the replacement character) is encoded instead.  Note that the UTF-8
-encoding of `U+FFFD` is the same length as the WTF-8 encoding of an
-isolated surrogate.
+```
+(string.encode_wtf16 $memory str:stringref ptr:address)
+  -> codeunits:i32
+```
+Encode the contents of the string *`str`* as WTF-16 to memory at
+*`ptr`*.  Return the number of code units written, which will be the
+same as returned by the corresponding `string.measure_wtf16`.
+
+Each code unit is written to memory as if stored by `i32.store16`, so
+WTF-16 code units are in little-endian byte order.
+
+The maximum number of bytes that can be encoded at once by
+`string.encode` is 2<sup>31</sup>-1.  If an encoding would require more
+bytes, it is as if the codepoints can't be encoded (a trap).
 
 ### Concatenation
 
@@ -401,7 +455,11 @@ may allow for 64-bit variants of the position-using instructions, which
 could relax this restriction.)
 
 ```
-(stringview_wtf8.encode $memory $wtf8_policy view:stringview_wtf8 ptr:address pos:i32 bytes:i32)
+(stringview_wtf8.encode_utf8 $memory view:stringview_wtf8 ptr:address pos:i32 bytes:i32)
+  -> next_pos:i32, bytes:i32
+(stringview_wtf8.encode_lossy_utf8 $memory view:stringview_wtf8 ptr:address pos:i32 bytes:i32)
+  -> next_pos:i32, bytes:i32
+(stringview_wtf8.encode_wtf8 $memory view:stringview_wtf8 ptr:address pos:i32 bytes:i32)
   -> next_pos:i32, bytes:i32
 ```
 Write a subsequence of the WTF-8 encoding of *`view`* to memory at
@@ -419,8 +477,11 @@ proposal](https://github.com/WebAssembly/memory64/blob/main/proposals/memory64/O
 may allow for 64-bit variants of the position-using instructions, which
 could relax this restriction.)
 
-If an isolated surrogate is seen, the behavior determines on the
-*`$wtf8_policy`* immediate, as in `string.encode_wtf8`.
+If an isolated surrogate is seen, the behavior depends on the
+instruction:
+ * `stringview_wtf8.encode_utf8` will trap.
+ * `stringview_wtf8.encode_lossy_utf8` will encode `U+FFFD`.
+ * `stringview_wtf8.encode_wtf8` will encode the isolated surrogate.
 
 ```
 (stringview_wtf8.slice view:stringview_wtf8 start:i32 end:i32)
@@ -542,16 +603,23 @@ The instructions below shall be available in WebAssembly implementations
 that support both GC and stringrefs.
 
 ```
-(string.new_wtf8_array $wtf8_policy codeunits:$t start:i32 end:i32)
+(string.new_utf8_array codeunits:$t start:i32 end:i32)
+  if expand($t) => array i8
+  -> str:stringref
+(string.new_lossy_utf8_array codeunits:$t start:i32 end:i32)
+  if expand($t) => array i8
+  -> str:stringref
+(string.new_wtf8_array codeunits:$t start:i32 end:i32)
   if expand($t) => array i8
   -> str:stringref
 ```
-Create a new string from a subsequence of the *`codeunits`* WTF-8 bytes
-in a GC-managed array, starting from offset *`start`* and continuing to
-but not including *`end`*.  If *`end`* is less than *`start`* or is
-greater than the array length, trap.  The bytes are decoded according to
-`$wtf8_policy`, as in `string.new_wtf8`.  The maximum value for
-*`end`*–*`start`* is 2<sup>31</sup>–1; passing a higher value traps.
+Create a new string from a subsequence of the *`codeunits`* bytes in a
+GC-managed array, starting from offset *`start`* and continuing to but
+not including *`end`*.  If *`end`* is less than *`start`* or is greater
+than the array length, trap.  The bytes are decoded in the same way as
+`string.new_utf8`, `string.new_lossy_utf8`, and `string.new_wtf8`,
+respectively.  The maximum value for *`end`*–*`start`* is
+2<sup>31</sup>–1; passing a higher value traps.
 
 ```
 (string.new_wtf16_array codeunits:$t start:i32 end:i32)
@@ -566,7 +634,13 @@ for *`end`*–*`start`* is 2<sup>30</sup>–1; passing a higher value
 traps.
 
 ```
-(string.encode_wtf8_array $wtf8_policy str:stringref array:$t start:i32)
+(string.encode_utf8_array str:stringref array:$t start:i32)
+  if expand($t) => array (mut i8)
+  -> codeunits:i32
+(string.encode_lossy_utf8_array str:stringref array:$t start:i32)
+  if expand($t) => array (mut i8)
+  -> codeunits:i32
+(string.encode_wtf8_array str:stringref array:$t start:i32)
   if expand($t) => array (mut i8)
   -> codeunits:i32
 (string.encode_wtf16_array str:stringref array:$t start:i32)
@@ -581,9 +655,9 @@ same as the result of a the corresponding `string.measure_wtf8` or
 code units in the array, trap.  Note that no `NUL` terminator is ever
 written.
 
-For `string.encode_wtf8_array`, if an isolated surrogate is seen, the
-behavior depends on the *`$wtf8_policy`* immediate, in the same way as
-`string.encode_wtf8`.
+For `string.encode_utf8_array`, trap if an isolated surrogate is seen.
+For `string.encode_lossy_utf8_array`, replace isolated surrogates with
+`U+FFFD`.
 
 ## Binary encoding
 
@@ -594,39 +668,46 @@ reftype ::= ...
          |  0x62 ⇒ stringview_wtf16  ; SLEB128(-0x1e)
          |  0x61 ⇒ stringview_iter   ; SLEB128(-0x1f)
 
-wtf8_policy ::= 0x00 ⇒ utf8
-             |  0x01 ⇒ wtf8
-             |  0x02 ⇒ replace
-
 instr ::= ...
-       |  0xfb 0x80:u32 $mem:u32 $policy:u32  ⇒ string.new_wtf8 $mem $policy
-       |  0xfb 0x81:u32 $mem:u32              ⇒ string.new_wtf16 $mem
-       |  0xfb 0x82:u32 $idx:u32              ⇒ string.const $idx
-       |  0xfb 0x84:u32 $policy:u32           ⇒ string.measure_wtf8 $policy
-       |  0xfb 0x85:u32                       ⇒ string.measure_wtf16
-       |  0xfb 0x86:u32 $mem:u32 $policy:u32  ⇒ string.encode_wtf8 $mem $policy
-       |  0xfb 0x87:u32 $mem:u32              ⇒ string.encode_wtf16 $mem
-       |  0xfb 0x88:u32                       ⇒ string.concat
-       |  0xfb 0x89:u32                       ⇒ string.eq
-       |  0xfb 0x8a:u32                       ⇒ string.is_usv_sequence
-       |  0xfb 0x90:u32                       ⇒ string.as_wtf8
-       |  0xfb 0x91:u32                       ⇒ stringview_wtf8.advance
-       |  0xfb 0x92:u32 $mem:u32 $policy:u32  ⇒ stringview_wtf8.encode $mem $policy
-       |  0xfb 0x93:u32                       ⇒ stringview_wtf8.slice
-       |  0xfb 0x98:u32                       ⇒ string.as_wtf16
-       |  0xfb 0x99:u32                       ⇒ stringview_wtf16.length
-       |  0xfb 0x9a:u32                       ⇒ stringview_wtf16.get_codeunit
-       |  0xfb 0x9b:u32 $mem:u32              ⇒ stringview_wtf16.encode $mem
-       |  0xfb 0x9c:u32                       ⇒ stringview_wtf16.slice
-       |  0xfb 0xa0:u32                       ⇒ string.as_iter
-       |  0xfb 0xa1:u32                       ⇒ stringview_iter.next
-       |  0xfb 0xa2:u32                       ⇒ stringview_iter.advance
-       |  0xfb 0xa3:u32                       ⇒ stringview_iter.rewind
-       |  0xfb 0xa4:u32                       ⇒ stringview_iter.slice
-       |  0xfb 0xb0:u32 $policy:u32      [gc] ⇒ string.new_wtf8_array $policy
-       |  0xfb 0xb1:u32                  [gc] ⇒ string.new_wtf16_array
-       |  0xfb 0xb2:u32 $policy:u32      [gc] ⇒ string.encode_wtf8_array $policy
-       |  0xfb 0xb3:u32                  [gc] ⇒ string.encode_wtf16_array
+       |  0xfb 0xc0:u32 $mem:u32       ⇒ string.new_utf8 $mem
+       |  0xfb 0xc1:u32 $mem:u32       ⇒ string.new_lossy_utf8 $mem
+       |  0xfb 0xc2:u32 $mem:u32       ⇒ string.new_wtf8 $mem
+       |  0xfb 0x81:u32 $mem:u32       ⇒ string.new_wtf16 $mem
+       |  0xfb 0x82:u32 $idx:u32       ⇒ string.const $idx
+       |  0xfb 0xc3:u32                ⇒ string.measure_utf8
+       |  0xfb 0xc4:u32                ⇒ string.measure_wtf8
+       |  0xfb 0x85:u32                ⇒ string.measure_wtf16
+       |  0xfb 0xc5:u32 $mem:u32       ⇒ string.encode_utf8 $mem
+       |  0xfb 0xc6:u32 $mem:u32       ⇒ string.encode_lossy_utf8 $mem
+       |  0xfb 0xc7:u32 $mem:u32       ⇒ string.encode_wtf8 $mem
+       |  0xfb 0x87:u32 $mem:u32       ⇒ string.encode_wtf16 $mem
+       |  0xfb 0x88:u32                ⇒ string.concat
+       |  0xfb 0x89:u32                ⇒ string.eq
+       |  0xfb 0x8a:u32                ⇒ string.is_usv_sequence
+       |  0xfb 0x90:u32                ⇒ string.as_wtf8
+       |  0xfb 0x91:u32                ⇒ stringview_wtf8.advance
+       |  0xfb 0xd0:u32 $mem:u32       ⇒ stringview_wtf8.encode_utf8 $mem
+       |  0xfb 0xd1:u32 $mem:u32       ⇒ stringview_wtf8.encode_lossy_utf8 $mem
+       |  0xfb 0xd2:u32 $mem:u32       ⇒ stringview_wtf8.encode_wtf8 $mem
+       |  0xfb 0x93:u32                ⇒ stringview_wtf8.slice
+       |  0xfb 0x98:u32                ⇒ string.as_wtf16
+       |  0xfb 0x99:u32                ⇒ stringview_wtf16.length
+       |  0xfb 0x9a:u32                ⇒ stringview_wtf16.get_codeunit
+       |  0xfb 0x9b:u32 $mem:u32       ⇒ stringview_wtf16.encode $mem
+       |  0xfb 0x9c:u32                ⇒ stringview_wtf16.slice
+       |  0xfb 0xa0:u32                ⇒ string.as_iter
+       |  0xfb 0xa1:u32                ⇒ stringview_iter.next
+       |  0xfb 0xa2:u32                ⇒ stringview_iter.advance
+       |  0xfb 0xa3:u32                ⇒ stringview_iter.rewind
+       |  0xfb 0xa4:u32                ⇒ stringview_iter.slice
+       |  0xfb 0xe0:u32           [gc] ⇒ string.new_utf8_array
+       |  0xfb 0xe1:u32           [gc] ⇒ string.new_lossy_utf8_array
+       |  0xfb 0xe2:u32           [gc] ⇒ string.new_wtf8_array
+       |  0xfb 0xb1:u32           [gc] ⇒ string.new_wtf16_array
+       |  0xfb 0xe3:u32           [gc] ⇒ string.encode_utf8_array
+       |  0xfb 0xe4:u32           [gc] ⇒ string.encode_lossy_utf8_array
+       |  0xfb 0xe5:u32           [gc] ⇒ string.encode_wtf8_array
+       |  0xfb 0xb3:u32           [gc] ⇒ string.encode_wtf16_array
 
 ;; New section.  If present, must be present only once, and right before
 ;; the globals section (or where the globals section would be).  Each
@@ -652,13 +733,12 @@ operand allows you to elide the memory, in which case it defaults to 0.
   local.get $ptr
   local.get $ptr
   call $strlen
-  string.new_wtf8)
+  string.new_utf8)
 ```
 
-Generally speaking, this proposal only distinguishes between UTF-8 and
-WTF-8 when encoding string contents to memory.  As this is a a decode
-operation, the proposal just has a WTF-8 interface, as WTF-8 is a
-superset of UTF-8.
+If the bytes being decoded aren't actually valid UTF-8, this function
+will trap.  Use `string.new_lossy_utf8` in contexts where replacing
+invalid data with `U+FFFD` is a better strategy than trapping.
 
 ### Make string from an array of WTF-8 code units in memory
 
@@ -669,6 +749,10 @@ superset of UTF-8.
   string.new_wtf8)
 ```
 
+Note that `string.new_wtf8` (and `string.new_wtf8_array`) are always
+strict decoders: if the bytes are not valid WTF-8, the instruction
+traps.
+
 ### Make string from UTF-16 in memory
 
 ```wasm
@@ -868,7 +952,7 @@ open to considering adding more instructions.
   (local $len i32)
   (local $ptr i32)
   local.get $str
-  string.measure_wtf8 utf8
+  string.measure_utf8
   local.set $len
 
   block $valid
@@ -887,7 +971,7 @@ open to considering adding more instructions.
 
   local.get $str
   local.get $ptr
-  string.encode_wtf8 wtf8          ;; push bytes written, same as $len
+  string.encode_utf8        ;; push bytes written, same as $len
 
   local.get $ptr
   i32.add
@@ -898,12 +982,17 @@ open to considering adding more instructions.
   return)
 ```
 
-Using `string.measure_wtf8 utf8` ensures that the encoded string is a
-valid unicode scalar value sequence.  How to handle invalid UTF-8 is up
-to the user; instead of `unreachable` we could throw an exception.
+Using `string.measure_utf8` ensures that the encoded string is a valid
+unicode scalar value sequence.  How to handle invalid UTF-8 is up to the
+user; instead of `unreachable` we could throw an exception.
+
+Note that in this case, the subsequent `string.encode_utf8` could just
+as well have been `string.encode_lossy_utf8` or `string.encode_wtf8`, as
+these instructions are all the same for strings that do not contain
+isolated surrogates, and we checked that there were none.
 
 If we meant to handle isolated surrogates, we could use
-`string.measure_wtf8 wtf8` instead.
+`string.measure_wtf8` instead.
 
 ### Stream over contents of string
 
@@ -923,7 +1012,7 @@ will encode isolated surrogates as WTF-8.
     local.get $cursor
     global.get $buf
     i32.const 1024
-    string.encode_wtf8 wtf8          ;; push bytes written
+    string.encode_wtf8               ;; push bytes written
     local.tee $bytes
     (if i32.eqz (then return))       ;; if no bytes encoded, done
     local.get $bytes

From 610beb3d52b1a974f8bd077719a1d9c1279b1d87 Mon Sep 17 00:00:00 2001
From: Andy Wingo <wingo@igalia.com>
Date: Mon, 12 Sep 2022 15:53:26 +0200
Subject: [PATCH 14/15] Typo fix

---
 proposals/stringref/Overview.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/proposals/stringref/Overview.md b/proposals/stringref/Overview.md
index 4863b4f..d6b76ed 100644
--- a/proposals/stringref/Overview.md
+++ b/proposals/stringref/Overview.md
@@ -215,7 +215,7 @@ Out-of-bounds access will trap.  The maximum value for *`bytes`* is
 
 These three instructions decode the bytes in three different ways:
 
- * `string.new_utf8`, decodes using a strict UTF-8 decoder.  If the
+ * `string.new_utf8` decodes using a strict UTF-8 decoder.  If the
     bytes are not valid UTF-8, trap.
 
  * `string.new_lossy_utf8` decodes using a sloppy UTF-8 decoder: all

From cd97570867ed4c771f58873a50e1c808c7b145c0 Mon Sep 17 00:00:00 2001
From: Andy Wingo <wingo@igalia.com>
Date: Mon, 12 Sep 2022 16:29:24 +0200
Subject: [PATCH 15/15] Make binary encoding less chaotic :)

---
 proposals/stringref/Overview.md | 34 ++++++++++++++++-----------------
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/proposals/stringref/Overview.md b/proposals/stringref/Overview.md
index d6b76ed..c26efbe 100644
--- a/proposals/stringref/Overview.md
+++ b/proposals/stringref/Overview.md
@@ -669,27 +669,27 @@ reftype ::= ...
          |  0x61 ⇒ stringview_iter   ; SLEB128(-0x1f)
 
 instr ::= ...
-       |  0xfb 0xc0:u32 $mem:u32       ⇒ string.new_utf8 $mem
-       |  0xfb 0xc1:u32 $mem:u32       ⇒ string.new_lossy_utf8 $mem
-       |  0xfb 0xc2:u32 $mem:u32       ⇒ string.new_wtf8 $mem
+       |  0xfb 0x80:u32 $mem:u32       ⇒ string.new_utf8 $mem
        |  0xfb 0x81:u32 $mem:u32       ⇒ string.new_wtf16 $mem
        |  0xfb 0x82:u32 $idx:u32       ⇒ string.const $idx
-       |  0xfb 0xc3:u32                ⇒ string.measure_utf8
-       |  0xfb 0xc4:u32                ⇒ string.measure_wtf8
+       |  0xfb 0x83:u32                ⇒ string.measure_utf8
+       |  0xfb 0x84:u32                ⇒ string.measure_wtf8
        |  0xfb 0x85:u32                ⇒ string.measure_wtf16
-       |  0xfb 0xc5:u32 $mem:u32       ⇒ string.encode_utf8 $mem
-       |  0xfb 0xc6:u32 $mem:u32       ⇒ string.encode_lossy_utf8 $mem
-       |  0xfb 0xc7:u32 $mem:u32       ⇒ string.encode_wtf8 $mem
+       |  0xfb 0x86:u32 $mem:u32       ⇒ string.encode_utf8 $mem
        |  0xfb 0x87:u32 $mem:u32       ⇒ string.encode_wtf16 $mem
        |  0xfb 0x88:u32                ⇒ string.concat
        |  0xfb 0x89:u32                ⇒ string.eq
        |  0xfb 0x8a:u32                ⇒ string.is_usv_sequence
+       |  0xfb 0x8b:u32 $mem:u32       ⇒ string.new_lossy_utf8 $mem
+       |  0xfb 0x8c:u32 $mem:u32       ⇒ string.new_wtf8 $mem
+       |  0xfb 0x8d:u32 $mem:u32       ⇒ string.encode_lossy_utf8 $mem
+       |  0xfb 0x8e:u32 $mem:u32       ⇒ string.encode_wtf8 $mem
        |  0xfb 0x90:u32                ⇒ string.as_wtf8
        |  0xfb 0x91:u32                ⇒ stringview_wtf8.advance
-       |  0xfb 0xd0:u32 $mem:u32       ⇒ stringview_wtf8.encode_utf8 $mem
-       |  0xfb 0xd1:u32 $mem:u32       ⇒ stringview_wtf8.encode_lossy_utf8 $mem
-       |  0xfb 0xd2:u32 $mem:u32       ⇒ stringview_wtf8.encode_wtf8 $mem
+       |  0xfb 0x92:u32 $mem:u32       ⇒ stringview_wtf8.encode_utf8 $mem
        |  0xfb 0x93:u32                ⇒ stringview_wtf8.slice
+       |  0xfb 0x94:u32 $mem:u32       ⇒ stringview_wtf8.encode_lossy_utf8 $mem
+       |  0xfb 0x95:u32 $mem:u32       ⇒ stringview_wtf8.encode_wtf8 $mem
        |  0xfb 0x98:u32                ⇒ string.as_wtf16
        |  0xfb 0x99:u32                ⇒ stringview_wtf16.length
        |  0xfb 0x9a:u32                ⇒ stringview_wtf16.get_codeunit
@@ -700,14 +700,14 @@ instr ::= ...
        |  0xfb 0xa2:u32                ⇒ stringview_iter.advance
        |  0xfb 0xa3:u32                ⇒ stringview_iter.rewind
        |  0xfb 0xa4:u32                ⇒ stringview_iter.slice
-       |  0xfb 0xe0:u32           [gc] ⇒ string.new_utf8_array
-       |  0xfb 0xe1:u32           [gc] ⇒ string.new_lossy_utf8_array
-       |  0xfb 0xe2:u32           [gc] ⇒ string.new_wtf8_array
+       |  0xfb 0xb0:u32           [gc] ⇒ string.new_utf8_array
        |  0xfb 0xb1:u32           [gc] ⇒ string.new_wtf16_array
-       |  0xfb 0xe3:u32           [gc] ⇒ string.encode_utf8_array
-       |  0xfb 0xe4:u32           [gc] ⇒ string.encode_lossy_utf8_array
-       |  0xfb 0xe5:u32           [gc] ⇒ string.encode_wtf8_array
+       |  0xfb 0xb2:u32           [gc] ⇒ string.encode_utf8_array
        |  0xfb 0xb3:u32           [gc] ⇒ string.encode_wtf16_array
+       |  0xfb 0xb4:u32           [gc] ⇒ string.new_lossy_utf8_array
+       |  0xfb 0xb5:u32           [gc] ⇒ string.new_wtf8_array
+       |  0xfb 0xb6:u32           [gc] ⇒ string.encode_lossy_utf8_array
+       |  0xfb 0xb7:u32           [gc] ⇒ string.encode_wtf8_array
 
 ;; New section.  If present, must be present only once, and right before
 ;; the globals section (or where the globals section would be).  Each