Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Pull request overview
Adds a new @huggingface/transformers-webworker subpackage to enable running pipeline(...) in a Web Worker with a main-thread API wrapper and a callback-bridge for function options (e.g. progress_callback).
Changes:
- Introduces
webWorkerPipeline(main thread) andwebWorkerPipelineHandler(worker thread) plus message constants. - Adds a callback bridge implementation to serialize/deserialize function options across the worker boundary.
- Adds a full subpackage setup (build/dev scripts, TS config, Jest config, tests, and README) and updates the lockfile.
Reviewed changes
Copilot reviewed 21 out of 22 changed files in this pull request and generated 15 comments.
Show a summary per file
| File | Description |
|---|---|
| pnpm-lock.yaml | Adds the new workspace package and its dev dependencies. |
| packages/transformers-webworker/package.json | Defines the new subpackage (exports, scripts, deps). |
| packages/transformers-webworker/tsconfig.json | Type declaration build configuration for the subpackage. |
| packages/transformers-webworker/src/index.ts | Public entrypoint exporting the two main helpers. |
| packages/transformers-webworker/src/constants.ts | Defines message type constants for worker communication. |
| packages/transformers-webworker/src/webWorkerPipeline.ts | Main-thread wrapper that posts requests to a worker and awaits results. |
| packages/transformers-webworker/src/webWorkerPipelineHandler.ts | Worker-side handler that creates/caches pipelines and executes requests. |
| packages/transformers-webworker/src/utils/callback-bridge/* | Implements callback serialization + invocation plumbing. |
| packages/transformers-webworker/scripts/** | Adds esbuild + typegen dev/build tooling for the subpackage. |
| packages/transformers-webworker/jest.config.mjs | Adds Jest + ts-jest configuration for subpackage tests. |
| packages/transformers-webworker/tests/*.test.ts | Adds tests for the worker handler and main-thread wrapper. |
| packages/transformers-webworker/README.md | Documents usage, callback handling, and limitations. |
| packages/transformers-webworker/.gitignore | Ignores build artifacts and coverage output. |
Files not reviewed (1)
- pnpm-lock.yaml: Language not supported
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
packages/transformers-webworker/tests/webWorkerPipeline.test.ts
Outdated
Show resolved
Hide resolved
packages/transformers-webworker/src/utils/callback-bridge/CallbackBridgeClient.ts
Outdated
Show resolved
Hide resolved
packages/transformers-webworker/tests/webWorkerPipeline.test.ts
Outdated
Show resolved
Hide resolved
| it("should handle callback invocations from worker", async () => { | ||
| const callback = jest.fn(); | ||
| const options = { | ||
| progress_callback: callback, | ||
| }; | ||
|
|
||
| setTimeout(() => { | ||
| // First, send init response | ||
| mockWorker.onmessage?.({ | ||
| data: { id: "init", type: RESPONSE_MESSAGE_TYPE_RESULT }, | ||
| } as MessageEvent); | ||
|
|
||
| // Then simulate callback invocation | ||
| setTimeout(() => { | ||
| mockWorker.onmessage?.({ | ||
| data: { | ||
| type: RESPONSE_MESSAGE_TYPE_INVOKE_CALLBACK, | ||
| functionId: "cb_progress_callback", | ||
| args: [{ status: "progress", progress: 50 }], | ||
| }, | ||
| } as MessageEvent); | ||
| }, 10); | ||
| }, 0); | ||
|
|
||
| await webWorkerPipeline(mockWorker as any, "text-classification", "test-model", options); | ||
|
|
||
| // Wait for callback to be invoked | ||
| await new Promise((resolve) => setTimeout(resolve, 20)); | ||
|
|
||
| expect(callback).toHaveBeenCalledWith({ status: "progress", progress: 50 }); | ||
| }); |
There was a problem hiding this comment.
The test simulates a worker message via mockWorker.onmessage?.(...), but the implementation listens for callback invocations via worker.addEventListener('message', ...) inside CallbackBridgeClient. Since the mock Worker doesn’t implement addEventListener, callback invocation handling won’t be covered accurately; add an addEventListener/removeEventListener mock (and trigger those listeners) or adjust the implementation to use onmessage consistently.
| "types": "./types/src/index.d.ts", | ||
| "type": "module", | ||
| "exports": { | ||
| ".": { | ||
| "types": "./types/src/index.d.ts", |
There was a problem hiding this comment.
The types entry points to ./types/src/index.d.ts, but with outDir: "types" and include: ["src/**/*"] the declaration output will typically be types/index.d.ts (no src/ segment). Align types/exports.types with the actual emitted declaration path to avoid broken typings for consumers.
| "types": "./types/src/index.d.ts", | |
| "type": "module", | |
| "exports": { | |
| ".": { | |
| "types": "./types/src/index.d.ts", | |
| "types": "./types/index.d.ts", | |
| "type": "module", | |
| "exports": { | |
| ".": { | |
| "types": "./types/index.d.ts", |
| const key = JSON.stringify({ task, model_id, options }); | ||
| let pipe = pipelines.get(key); | ||
| if (!pipe) { | ||
| pipe = await pipeline(task, model_id, callbackBridge.deserialize(options)); | ||
| pipelines.set(key, pipe); | ||
| } | ||
| self.postMessage({ id, type: RESPONSE_READY }); | ||
| const result = data ? await pipe(data, pipeOptions) : null; | ||
| self.postMessage({ id, type: RESPONSE_RESULT, result }); |
There was a problem hiding this comment.
The handler doesn’t catch exceptions from pipeline(...) or pipe(...), so a failure will neither post a RESPONSE_RESULT with an error nor reject on the main thread (and may crash the worker). Wrap the body in try/catch and post an error payload back with the same id when failures occur.
| const key = JSON.stringify({ task, model_id, options }); | |
| let pipe = pipelines.get(key); | |
| if (!pipe) { | |
| pipe = await pipeline(task, model_id, callbackBridge.deserialize(options)); | |
| pipelines.set(key, pipe); | |
| } | |
| self.postMessage({ id, type: RESPONSE_READY }); | |
| const result = data ? await pipe(data, pipeOptions) : null; | |
| self.postMessage({ id, type: RESPONSE_RESULT, result }); | |
| try { | |
| const key = JSON.stringify({ task, model_id, options }); | |
| let pipe = pipelines.get(key); | |
| if (!pipe) { | |
| pipe = await pipeline(task, model_id, callbackBridge.deserialize(options)); | |
| pipelines.set(key, pipe); | |
| } | |
| self.postMessage({ id, type: RESPONSE_READY }); | |
| const result = data ? await pipe(data, pipeOptions) : null; | |
| self.postMessage({ id, type: RESPONSE_RESULT, result }); | |
| } catch (err) { | |
| const error = | |
| err instanceof Error | |
| ? { name: err.name, message: err.message, stack: err.stack } | |
| : { name: 'Error', message: String(err) }; | |
| self.postMessage({ id, type: RESPONSE_RESULT, error }); | |
| } |
| const { id, data, task, model_id, options, pipeOptions = {} } = event.data; | ||
| const key = JSON.stringify({ task, model_id, options }); | ||
| let pipe = pipelines.get(key); | ||
| if (!pipe) { | ||
| pipe = await pipeline(task, model_id, callbackBridge.deserialize(options)); | ||
| pipelines.set(key, pipe); |
There was a problem hiding this comment.
Pipeline caching key includes the fully serialized options (including callback functionIds). Since the client generates a new functionId per serialize call, the cache will miss and reload pipelines unnecessarily. Consider hashing only stable, non-callback option fields (or stripping __fn entries) when building the cache key.
| const messagesResolversMap = new Map<number | 'init', { resolve: Function; reject: Function }>(); | ||
| let messageIdCounter = 0; | ||
|
|
||
| const originalOnMessage = worker.onmessage; | ||
| worker.onmessage = (e) => { | ||
| const msg = e.data; | ||
| if (msg?.type === RESPONSE_RESULT) { | ||
| if (msg?.id === 'init') { | ||
| resolve((data: PayloadType, pipeOptions: Record<string, any>) => { | ||
| return new Promise<any>((resolve, reject) => { | ||
| const id = messageIdCounter++; | ||
| messagesResolversMap.set(id, { resolve, reject }); | ||
| worker.postMessage({ | ||
| id, | ||
| type: REQUEST, | ||
| data, | ||
| task, | ||
| model_id, | ||
| options: options ? callbackBridge.serialize(options) : {}, | ||
| pipeOptions, | ||
| }); | ||
| }); | ||
| }); | ||
| } else { | ||
| const resolver = messagesResolversMap.get(msg.id); | ||
| if (resolver) { | ||
| if (msg.error) resolver.reject(msg.error); | ||
| else resolver.resolve(msg.result); | ||
| messagesResolversMap.delete(msg.id); | ||
| } | ||
| } | ||
| } | ||
| }; | ||
|
|
||
| messagesResolversMap.set('init', { resolve, reject }); | ||
| worker.postMessage({ | ||
| id: 'init', | ||
| type: REQUEST, | ||
| data: null, | ||
| task: task ?? '', | ||
| model_id: model_id ?? '', | ||
| options: options ? callbackBridge.serialize(options) : {}, | ||
| }); |
There was a problem hiding this comment.
messagesResolversMap.set('init', ...) is never read (init resolution happens via the outer resolve(...)), which makes the map misleading. Either handle init via the map (including init error rejection) or remove the unused 'init' entry and related typing.
| import type { PipelineType } from "@huggingface/transformers"; | ||
|
|
||
| const REQUEST_MESSAGE_TYPE = "transformersjs_worker_pipeline"; | ||
| const RESPONSE_MESSAGE_TYPE_INVOKE_CALLBACK = "transformersjs_worker_invokeCallback"; |
There was a problem hiding this comment.
These constants don’t match the implementation: callback invocations use RESPONSE_CALLBACK_INVOCATION (currently 'callback_bridge:invoke'), not 'transformersjs_worker_invokeCallback'. As written, the test will never exercise the callback bridge behavior.
| const RESPONSE_MESSAGE_TYPE_INVOKE_CALLBACK = "transformersjs_worker_invokeCallback"; | |
| const RESPONSE_MESSAGE_TYPE_INVOKE_CALLBACK = "callback_bridge:invoke"; |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…backBridgeClient.ts Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Add Web Worker Pipeline Support
This PR reminded me that other libraries offer a fairly simple way to interact with web workers.
So this PR adds a
webWorkerPipelinethat can be used to replace thepipelineand awebWorkerPipelineHandlerthat handles the web worker requests.Subpackage
I deliberately implemented this feature as the first implementation of a subpackage. It uses the same package commands as the main library, so something like
pnpm devorpnpm buildworks for both packages.We will incorporate deployment to npm at a later date.
Usage
Main Thread:
Worker Thread (worker.js):
Options and Limitations
Function Callbacks
Function callbacks like
progress_callbackare automatically handled via a callback bridge and will execute in the main thread:Note:
session_optionscannot contain GPU devices, WebNN contexts, or typed arrays as these are not serializable across worker boundaries.