Conversation
| if ( | ||
| useFileOutput && | ||
| typeof data === "string" && | ||
| (data.startsWith("https:") || data.startsWith("data:")) |
There was a problem hiding this comment.
I think we want to explicitly match that data consists of only a valid data uri. There is a reasonable chance that a language model might emit a data: line that starts with "data:" but less so that it will emit a line that consists only of a well formed data-uri.
🤔 That said, I did think today after reading this post on other AI apis that we should move to structured outputs.
Perhaps the file stream should emit JSON.
data: {"type": "url", value: "data://..."}
And in future we refactor the text streaming interface to do the same:
data: {"type": "string", "data: and some more text"}
There was a problem hiding this comment.
I think we want to explicitly match that
dataconsists of only a valid data uri. There is a reasonable chance that a language model might emit adata:line that starts with "data:" but less so that it will emit a line that consists only of a well formed data-uri.
That's a good callout. The trick is finding a good way to validate without parsing the whole thing and throwing away the result. I think we can still read lazily if we use a regex to apply some heuristics about its first chunk of content.
🤔 That said, I did think today after reading this post on other AI apis that we should move to structured outputs.
Perhaps the file stream should emit JSON.
data: {"type": "url", value: "data://..."}And in future we refactor the text streaming interface to do the same:
data: {"type": "string", "data: and some more text"}
I see the advantages of typed outputs, but also quite like the experience we have now of emitting raw tokens. In any case, structured outputs would be a backwards-incompatible change, so we'd have to be clever about a migration.
One way we could support both would be keeping Accept: text/event-stream as-is, but adding support for Accept: text/event-stream+json. The client libraries could start sending that as needed to opt into structured outputs.
No description provided.