Add Data-Juicer Service #181

pan-x-c · 2025-08-12T07:52:11Z

Description

Add Data-Juicer Service

Add Data-Juicer Operator
Add Data-Juicer Client
Add Data-Juicer Server
Update doc and test case

Checklist

Please check the following items before code is ready to be reviewed.

Code has passed all tests
Docstrings have been added/updated in Google Style
Documentation has been updated
Code is ready for review

gemini-code-assist

Summary of Changes

Hello @pan-x-c, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces the initial integration of a Data-Juicer service into the system. It primarily focuses on adding the necessary components to interact with this service, including a new operator to process data and a client to manage communication. The goal is to enable external data processing capabilities, with the server-side implementation noted as a work in progress.

Highlights

Data-Juicer Operator Integration: A new DataJuicerOperator is introduced, allowing the system to leverage the Data-Juicer service for processing Experience objects. This operator can be configured using various methods, including direct operator lists, a configuration file, or a natural language description.
Dedicated Data-Juicer Client: A DataJuicerClient has been added to handle the communication with the Data-Juicer server. This client manages session initialization, sends and receives data (converted to/from HuggingFace Datasets via Parquet files), and ensures proper session closure.
Experience Data Conversion Utilities: New utility functions (to_hf_datasets, from_hf_datasets) are now available to seamlessly convert Experience objects to and from HuggingFace Dataset format, which is crucial for efficient data transfer to the Data-Juicer service.
Enhanced Configuration Management: New configuration classes, DataJuicerConfig and ServiceConfig, have been added to trinity/common/config.py. These allow for detailed configuration of the Data-Juicer service, including server URL, OpenAI API settings for its agent, and options for automatic server startup.
Improved Resource Cleanup: The base ExperienceOperator now includes a close method, and the ExperiencePipeline has been updated to iterate through and call this close method on all its operators. This ensures that resources held by operators, such as the DataJuicerClient connection, are properly released.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a Data-Juicer service, including a client and an operator to integrate it into the experience processing pipeline. The changes are generally well-structured. However, I've identified a critical issue in the DataJuicerOperator where the process method's signature and return type do not match the base class, which would cause a runtime error. Additionally, I've provided suggestions to improve the DataJuicerClient for better efficiency and robustness, such as using in-memory buffers and adding error handling. Please address these points to ensure the new service is reliable and performs well.

trinity/buffer/operators/data_juicer_operator.py

trinity/service/data_juicer/client.py

pan-x-c · 2025-08-12T08:25:54Z

/unittest-all

github-actions · 2025-08-12T08:52:00Z

Summary

Tests 📝	Passed ✅	Failed ❌	Skipped ⏭️	Other ❓	Flaky 🍂	Duration ⏱️
95	95	0	0	0	0	1.5s

Tests

Test Name	Status	Duration
tests/algorithm/add_strategy_test.py::TestAddStrategy::test_correct_bias_strategy	✅	1ms
tests/algorithm/add_strategy_test.py::TestAddStrategy::test_duplicate_add_strategy	✅	1ms
tests/algorithm/add_strategy_test.py::TestAddStrategy::test_grpo_args	✅	1ms
tests/algorithm/add_strategy_test.py::TestAddStrategy::test_reward_variance_strategy	✅	1ms
tests/algorithm/add_strategy_test.py::TestAddStrategy::test_step_wise_grpo_strategy	✅	1ms
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_duplicate_grpo	✅	1ms
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_grpo_advantage	✅	1ms
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_grpo_correct_bias	✅	1ms
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_grpo_reward_std	✅	1ms
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_step_wise_grpo_advantage	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_dpo_policy_loss	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_gspo_policy_loss	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_mix_policy_loss	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_opmd_policy_loss	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_ppo_policy_loss	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_sft_policy_loss	✅	1ms
tests/buffer/experience_pipeline_test.py::TestExperiencePipeline::test_experience_pipeline	✅	10ms
tests/buffer/file_test.py::TestFileBuffer::test_file_buffer	✅	2ms
tests/buffer/file_test.py::TestFileBuffer::test_file_reader	✅	1ms
tests/buffer/file_test.py::TestFileBuffer::test_file_writer	✅	2ms
tests/buffer/queue_test.py::TestQueueBuffer::test_priority_queue_buffer_reuse	✅	7ms
tests/buffer/queue_test.py::TestQueueBuffer::test_priority_queue_capacity	✅	3ms
tests/buffer/queue_test.py::TestQueueBuffer::test_queue_buffer_0_queue	✅	4ms
tests/buffer/queue_test.py::TestQueueBuffer::test_queue_buffer_1_priority_queue	✅	4ms
tests/buffer/queue_test.py::TestQueueBuffer::test_queue_buffer_capacity	✅	4ms
tests/buffer/sql_test.py::TestSQLBuffer::test_create_sql_buffer	✅	4ms
tests/common/config_test.py::TestConfig::test_all_examples_are_valid	✅	1ms
tests/common/config_test.py::TestConfig::test_continue_from_checkpoint_is_valid	✅	1ms
tests/common/config_test.py::TestConfig::test_load_default_config	✅	4ms
tests/common/experience_test.py::TestEID::test_eid_properties	✅	1ms
tests/common/experience_test.py::TestExperience::test_action_mask_and_logprobs_type	✅	1ms
tests/common/experience_test.py::TestExperience::test_assertions	✅	1ms
tests/common/experience_test.py::TestExperience::test_dpo_experience	✅	1ms
tests/common/experience_test.py::TestExperience::test_gather	✅	1ms
tests/common/experience_test.py::TestExperience::test_hf_datasets_conversion	✅	1ms
tests/common/experience_test.py::TestExperience::test_multi_turn_experience	✅	1ms
tests/common/experience_test.py::TestExperience::test_serialize_deserialize	✅	1ms
tests/common/experience_test.py::TestExperience::test_single_turn_experience	✅	1ms
tests/common/experience_test.py::TestExperience::test_to_dict	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_batch_conversion	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_dpo_experience_batch_conversion	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_experience_model_experience_conversion	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_multiturn_experience_batch_converstion	✅	1ms
tests/common/vllm_test.py::ModelWrapperTest_0::test_generate	✅	35ms
tests/common/vllm_test.py::ModelWrapperTest_1::test_generate	✅	52ms
tests/common/vllm_test.py::ModelWrapperTest_2::test_generate	✅	50ms
tests/common/vllm_test.py::ModelWrapperTest_3::test_generate	✅	35ms
tests/common/vllm_test.py::ModelWrapperTest_4::test_generate	✅	46ms
tests/common/vllm_test.py::TestAPIServer::test_api	✅	24ms
tests/common/vllm_test.py::TestTokenizer::test_assistant_token_mask	✅	1ms
tests/common/vllm_test.py::TestAPIServerToolCall_0_deepseek_r1::test_api_tool_calls	✅	21ms
tests/common/vllm_test.py::TestAPIServerToolCall_1::test_api_tool_calls	✅	20ms
tests/explorer/explorer_test.py::BaseExplorerCase::test_explorer	✅	1ms
tests/explorer/explorer_test.py::TestExplorerCountdownEval::test_explorer	✅	51ms
tests/explorer/explorer_test.py::TestExplorerCountdownNoEval::test_explorer	✅	45ms
tests/explorer/explorer_test.py::TestExplorerGSM8k::test_explorer	✅	33ms
tests/explorer/scheduler_test.py::SchedulerTest::test_concurrent_operations	✅	4ms
tests/explorer/scheduler_test.py::SchedulerTest::test_get_results	✅	19ms
tests/explorer/scheduler_test.py::SchedulerTest::test_multi_step_execution	✅	4ms
tests/explorer/scheduler_test.py::SchedulerTest::test_non_repeatable_workflow	✅	4ms
tests/explorer/scheduler_test.py::SchedulerTest::test_scheduler_all_methods	✅	14ms
tests/explorer/scheduler_test.py::SchedulerTest::test_scheduler_restart_after_stop	✅	7ms
tests/explorer/scheduler_test.py::SchedulerTest::test_split_tasks	✅	7ms
tests/explorer/scheduler_test.py::SchedulerTest::test_stepwise_experience_eid	✅	4ms
tests/explorer/scheduler_test.py::SchedulerTest::test_wait_all	✅	7ms
tests/explorer/scheduler_test.py::SchedulerTest::test_wait_all_timeout_with_multi_batch	✅	13ms
tests/explorer/workflow_test.py::WorkflowTest::test_gsm8k_workflow	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_boxed_workflow	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_complex_workflow	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_eval_workflow	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_fraction_workflow	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_workflow	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_rm_gallery_workflow	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_repeatable	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_resettable	✅	1ms
tests/manager/synchronizer_test.py::TestSynchronizerExit::test_synchronizer	✅	28ms
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_0::test_synchronizer	✅	62ms
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_1::test_synchronizer	✅	66ms
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_2::test_synchronizer	✅	88ms
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_3::test_synchronizer	✅	87ms
tests/manager/synchronizer_test.py::TestNCCLBasedSynchronizer_0::test_synchronizer	✅	55ms
tests/manager/synchronizer_test.py::TestNCCLBasedSynchronizer_1::test_synchronizer	✅	56ms
tests/trainer/trainer_test.py::BaseTrainerCase::test_trainer	✅	1ms
tests/trainer/trainer_test.py::TestTrainerCountdown::test_trainer	✅	143ms
tests/trainer/trainer_test.py::TestStepAheadAsyncRL::test_trainer	✅	68ms
tests/trainer/trainer_test.py::TestTrainerGSM8K::test_trainer	✅	49ms
tests/trainer/trainer_test.py::TestTrainerSFTWarmupGSM8K::test_trainer	✅	55ms
tests/trainer/trainer_test.py::TestTrainerDPO::test_trainer	✅	31ms
tests/trainer/trainer_test.py::TestTrainerSFT::test_trainer	✅	30ms
tests/trainer/trainer_test.py::TestFullyAsyncMode::test_fully_async_mode_0_queue	✅	70ms
tests/trainer/trainer_test.py::TestFullyAsyncMode::test_fully_async_mode_1_priority_queue	✅	60ms
tests/utils/eval_utils_test.py::TestMathEvalUtils::test_extract_answer	✅	1ms
tests/utils/eval_utils_test.py::TestMathEvalUtils::test_verify_math_answer	✅	1ms
tests/utils/eval_utils_test.py::TestEvalUtils::test_is_equiv	✅	1ms
tests/utils/plugin_test.py::TestPluginLoader::test_load_plugins	✅	4ms

Github Test Reporter by CTRF 💚

pan-x-c · 2025-08-13T13:03:27Z

/unittest-all

github-actions · 2025-08-13T14:22:14Z

Summary

Tests 📝	Passed ✅	Failed ❌	Skipped ⏭️	Other ❓	Flaky 🍂	Duration ⏱️
99	96	3	0	0	0	1.6s

Failed Tests

Failed Tests ❌	Fail Message
❌ tests/service/data_juicer_test.py::TestDataJuicer::test_config	The test failed in the call phase
❌ tests/service/data_juicer_test.py::TestDataJuicer::test_server_start	The test failed in the call phase
❌ tests/service/data_juicer_test.py::TestDataJuicerOperators::test_data_juicer_operators	The test failed in the call phase

Tests

Test Name	Status	Duration
tests/algorithm/add_strategy_test.py::TestAddStrategy::test_correct_bias_strategy	✅	1ms
tests/algorithm/add_strategy_test.py::TestAddStrategy::test_duplicate_add_strategy	✅	1ms
tests/algorithm/add_strategy_test.py::TestAddStrategy::test_grpo_args	✅	1ms
tests/algorithm/add_strategy_test.py::TestAddStrategy::test_reward_variance_strategy	✅	1ms
tests/algorithm/add_strategy_test.py::TestAddStrategy::test_step_wise_grpo_strategy	✅	1ms
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_duplicate_grpo	✅	1ms
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_grpo_advantage	✅	1ms
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_grpo_correct_bias	✅	1ms
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_grpo_reward_std	✅	1ms
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_step_wise_grpo_advantage	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_dpo_policy_loss	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_gspo_policy_loss	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_mix_policy_loss	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_opmd_policy_loss	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_ppo_policy_loss	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_sft_policy_loss	✅	1ms
tests/buffer/experience_pipeline_test.py::TestExperiencePipeline::test_experience_pipeline	✅	11ms
tests/buffer/file_test.py::TestFileBuffer::test_file_buffer	✅	2ms
tests/buffer/file_test.py::TestFileBuffer::test_file_reader	✅	1ms
tests/buffer/file_test.py::TestFileBuffer::test_file_writer	✅	2ms
tests/buffer/queue_test.py::TestQueueBuffer::test_priority_queue_buffer_reuse	✅	7ms
tests/buffer/queue_test.py::TestQueueBuffer::test_priority_queue_capacity	✅	3ms
tests/buffer/queue_test.py::TestQueueBuffer::test_queue_buffer_0_queue	✅	4ms
tests/buffer/queue_test.py::TestQueueBuffer::test_queue_buffer_1_priority_queue	✅	4ms
tests/buffer/queue_test.py::TestQueueBuffer::test_queue_buffer_capacity	✅	4ms
tests/buffer/reward_shaping_mapper_test.py::TestRewardShapingMapper::test_basic_usage	✅	1ms
tests/buffer/sql_test.py::TestSQLBuffer::test_create_sql_buffer	✅	4ms
tests/common/config_test.py::TestConfig::test_all_examples_are_valid	✅	1ms
tests/common/config_test.py::TestConfig::test_continue_from_checkpoint_is_valid	✅	1ms
tests/common/config_test.py::TestConfig::test_load_default_config	✅	4ms
tests/common/experience_test.py::TestEID::test_eid_properties	✅	1ms
tests/common/experience_test.py::TestExperience::test_action_mask_and_logprobs_type	✅	1ms
tests/common/experience_test.py::TestExperience::test_assertions	✅	1ms
tests/common/experience_test.py::TestExperience::test_dpo_experience	✅	1ms
tests/common/experience_test.py::TestExperience::test_gather	✅	1ms
tests/common/experience_test.py::TestExperience::test_hf_datasets_conversion	✅	1ms
tests/common/experience_test.py::TestExperience::test_multi_turn_experience	✅	1ms
tests/common/experience_test.py::TestExperience::test_serialize_deserialize	✅	1ms
tests/common/experience_test.py::TestExperience::test_single_turn_experience	✅	1ms
tests/common/experience_test.py::TestExperience::test_to_dict	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_batch_conversion	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_dpo_experience_batch_conversion	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_experience_model_experience_conversion	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_multiturn_experience_batch_converstion	✅	1ms
tests/common/vllm_test.py::ModelWrapperTest_0::test_generate	✅	36ms
tests/common/vllm_test.py::ModelWrapperTest_1::test_generate	✅	53ms
tests/common/vllm_test.py::ModelWrapperTest_2::test_generate	✅	48ms
tests/common/vllm_test.py::ModelWrapperTest_3::test_generate	✅	35ms
tests/common/vllm_test.py::ModelWrapperTest_4::test_generate	✅	45ms
tests/common/vllm_test.py::TestAPIServer::test_api	✅	24ms
tests/common/vllm_test.py::TestTokenizer::test_assistant_token_mask	✅	1ms
tests/common/vllm_test.py::TestAPIServerToolCall_0_deepseek_r1::test_api_tool_calls	✅	21ms
tests/common/vllm_test.py::TestAPIServerToolCall_1::test_api_tool_calls	✅	20ms
tests/explorer/explorer_test.py::BaseExplorerCase::test_explorer	✅	1ms
tests/explorer/explorer_test.py::TestExplorerCountdownEval::test_explorer	✅	50ms
tests/explorer/explorer_test.py::TestExplorerCountdownNoEval::test_explorer	✅	73ms
tests/explorer/explorer_test.py::TestExplorerGSM8k::test_explorer	✅	31ms
tests/explorer/scheduler_test.py::SchedulerTest::test_concurrent_operations	✅	4ms
tests/explorer/scheduler_test.py::SchedulerTest::test_get_results	✅	19ms
tests/explorer/scheduler_test.py::SchedulerTest::test_multi_step_execution	✅	4ms
tests/explorer/scheduler_test.py::SchedulerTest::test_non_repeatable_workflow	✅	4ms
tests/explorer/scheduler_test.py::SchedulerTest::test_scheduler_all_methods	✅	14ms
tests/explorer/scheduler_test.py::SchedulerTest::test_scheduler_restart_after_stop	✅	7ms
tests/explorer/scheduler_test.py::SchedulerTest::test_split_tasks	✅	7ms
tests/explorer/scheduler_test.py::SchedulerTest::test_stepwise_experience_eid	✅	4ms
tests/explorer/scheduler_test.py::SchedulerTest::test_wait_all	✅	7ms
tests/explorer/scheduler_test.py::SchedulerTest::test_wait_all_timeout_with_multi_batch	✅	13ms
tests/explorer/workflow_test.py::WorkflowTest::test_gsm8k_workflow	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_boxed_workflow	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_complex_workflow	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_eval_workflow	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_fraction_workflow	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_workflow	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_rm_gallery_workflow	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_repeatable	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_resettable	✅	1ms
tests/manager/synchronizer_test.py::TestSynchronizerExit::test_synchronizer	✅	29ms
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_0::test_synchronizer	✅	62ms
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_1::test_synchronizer	✅	65ms
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_2::test_synchronizer	✅	83ms
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_3::test_synchronizer	✅	88ms
tests/manager/synchronizer_test.py::TestNCCLBasedSynchronizer_0::test_synchronizer	✅	54ms
tests/manager/synchronizer_test.py::TestNCCLBasedSynchronizer_1::test_synchronizer	✅	56ms
tests/service/data_juicer_test.py::TestDataJuicer::test_config	❌	1ms
tests/service/data_juicer_test.py::TestDataJuicer::test_server_start	❌	15ms
tests/service/data_juicer_test.py::TestDataJuicerOperators::test_data_juicer_operators	❌	9ms
tests/trainer/trainer_test.py::BaseTrainerCase::test_trainer	✅	1ms
tests/trainer/trainer_test.py::TestTrainerCountdown::test_trainer	✅	133ms
tests/trainer/trainer_test.py::TestStepAheadAsyncRL::test_trainer	✅	56ms
tests/trainer/trainer_test.py::TestTrainerGSM8K::test_trainer	✅	48ms
tests/trainer/trainer_test.py::TestTrainerSFTWarmupGSM8K::test_trainer	✅	58ms
tests/trainer/trainer_test.py::TestTrainerDPO::test_trainer	✅	33ms
tests/trainer/trainer_test.py::TestTrainerSFT::test_trainer	✅	30ms
tests/trainer/trainer_test.py::TestFullyAsyncMode::test_fully_async_mode_0_queue	✅	61ms
tests/trainer/trainer_test.py::TestFullyAsyncMode::test_fully_async_mode_1_priority_queue	✅	70ms
tests/utils/eval_utils_test.py::TestMathEvalUtils::test_extract_answer	✅	1ms
tests/utils/eval_utils_test.py::TestMathEvalUtils::test_verify_math_answer	✅	1ms
tests/utils/eval_utils_test.py::TestEvalUtils::test_is_equiv	✅	1ms
tests/utils/plugin_test.py::TestPluginLoader::test_load_plugins	✅	4ms

Github Test Reporter by CTRF 💚

pan-x-c · 2025-08-14T06:09:13Z

/unittest-all

Copilot

Pull Request Overview

This PR adds a comprehensive Data-Juicer service integration to Trinity, enabling data processing capabilities through a client-server architecture. The implementation includes both standalone server functionality and automated server management.

Key changes:

Implements Data-Juicer client with auto-start capabilities and experience processing
Adds Data-Juicer server with session management and HTTP API endpoints
Integrates Data-Juicer operator into the experience pipeline system

Reviewed Changes

Copilot reviewed 18 out of 20 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
trinity/utils/distributed.py	Adds port utility function for server auto-start
trinity/service/data_juicer/server/utils.py	Implements configuration parsing and validation
trinity/service/data_juicer/server/session.py	Manages Data-Juicer processing sessions
trinity/service/data_juicer/server/server.py	Implements Flask HTTP server with API endpoints
trinity/service/data_juicer/client.py	Provides client interface with auto-start capability
trinity/service/init.py	Exports Data-Juicer client
trinity/explorer/explorer.py	Updates method call to align with pipeline changes
trinity/common/models/vllm_model.py	Adds configuration options for response generation
trinity/common/experience.py	Enhances Experience class with dataset conversion support
trinity/common/config.py	Adds Data-Juicer service configuration
trinity/buffer/pipelines/experience_pipeline.py	Renames method and adds cleanup functionality
trinity/buffer/operators/experience_operator.py	Adds close method to operator base class
trinity/buffer/operators/data_juicer_operator.py	Implements Data-Juicer operator for pipeline integration
trinity/buffer/operators/init.py	Exports new Data-Juicer operator
tests/service/data_juicer_test.py	Comprehensive test suite for Data-Juicer functionality
tests/explorer/explorer_test.py	Updates tests for new configuration options
tests/common/experience_test.py	Tests dataset conversion functionality
tests/buffer/experience_pipeline_test.py	Updates test for renamed pipeline method

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

trinity/service/data_juicer/server/utils.py

trinity/service/data_juicer/server/server.py

trinity/service/data_juicer/client.py

trinity/common/experience.py

github-actions · 2025-08-14T06:38:59Z

Summary

Tests 📝	Passed ✅	Failed ❌	Skipped ⏭️	Other ❓	Flaky 🍂	Duration ⏱️
99	99	0	0	0	0	1.7s

Tests

Test Name	Status	Duration
tests/algorithm/add_strategy_test.py::TestAddStrategy::test_correct_bias_strategy	✅	1ms
tests/algorithm/add_strategy_test.py::TestAddStrategy::test_duplicate_add_strategy	✅	1ms
tests/algorithm/add_strategy_test.py::TestAddStrategy::test_grpo_args	✅	1ms
tests/algorithm/add_strategy_test.py::TestAddStrategy::test_reward_variance_strategy	✅	1ms
tests/algorithm/add_strategy_test.py::TestAddStrategy::test_step_wise_grpo_strategy	✅	1ms
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_duplicate_grpo	✅	1ms
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_grpo_advantage	✅	1ms
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_grpo_correct_bias	✅	1ms
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_grpo_reward_std	✅	1ms
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_step_wise_grpo_advantage	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_dpo_policy_loss	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_gspo_policy_loss	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_mix_policy_loss	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_opmd_policy_loss	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_ppo_policy_loss	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_sft_policy_loss	✅	1ms
tests/buffer/experience_pipeline_test.py::TestExperiencePipeline::test_experience_pipeline	✅	11ms
tests/buffer/file_test.py::TestFileBuffer::test_file_buffer	✅	2ms
tests/buffer/file_test.py::TestFileBuffer::test_file_reader	✅	1ms
tests/buffer/file_test.py::TestFileBuffer::test_file_writer	✅	2ms
tests/buffer/queue_test.py::TestQueueBuffer::test_priority_queue_buffer_reuse	✅	7ms
tests/buffer/queue_test.py::TestQueueBuffer::test_priority_queue_capacity	✅	3ms
tests/buffer/queue_test.py::TestQueueBuffer::test_queue_buffer_0_queue	✅	4ms
tests/buffer/queue_test.py::TestQueueBuffer::test_queue_buffer_1_priority_queue	✅	4ms
tests/buffer/queue_test.py::TestQueueBuffer::test_queue_buffer_capacity	✅	4ms
tests/buffer/reward_shaping_mapper_test.py::TestRewardShapingMapper::test_basic_usage	✅	1ms
tests/buffer/sql_test.py::TestSQLBuffer::test_create_sql_buffer	✅	4ms
tests/common/config_test.py::TestConfig::test_all_examples_are_valid	✅	1ms
tests/common/config_test.py::TestConfig::test_continue_from_checkpoint_is_valid	✅	1ms
tests/common/config_test.py::TestConfig::test_load_default_config	✅	8ms
tests/common/experience_test.py::TestEID::test_eid_properties	✅	1ms
tests/common/experience_test.py::TestExperience::test_action_mask_and_logprobs_type	✅	1ms
tests/common/experience_test.py::TestExperience::test_assertions	✅	1ms
tests/common/experience_test.py::TestExperience::test_dpo_experience	✅	1ms
tests/common/experience_test.py::TestExperience::test_gather	✅	1ms
tests/common/experience_test.py::TestExperience::test_hf_datasets_conversion	✅	1ms
tests/common/experience_test.py::TestExperience::test_multi_turn_experience	✅	1ms
tests/common/experience_test.py::TestExperience::test_serialize_deserialize	✅	1ms
tests/common/experience_test.py::TestExperience::test_single_turn_experience	✅	1ms
tests/common/experience_test.py::TestExperience::test_to_dict	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_batch_conversion	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_dpo_experience_batch_conversion	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_experience_model_experience_conversion	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_multiturn_experience_batch_converstion	✅	1ms
tests/common/vllm_test.py::ModelWrapperTest_0::test_generate	✅	37ms
tests/common/vllm_test.py::ModelWrapperTest_1::test_generate	✅	53ms
tests/common/vllm_test.py::ModelWrapperTest_2::test_generate	✅	47ms
tests/common/vllm_test.py::ModelWrapperTest_3::test_generate	✅	34ms
tests/common/vllm_test.py::ModelWrapperTest_4::test_generate	✅	45ms
tests/common/vllm_test.py::TestAPIServer::test_api	✅	24ms
tests/common/vllm_test.py::TestTokenizer::test_assistant_token_mask	✅	1ms
tests/common/vllm_test.py::TestAPIServerToolCall_0_deepseek_r1::test_api_tool_calls	✅	21ms
tests/common/vllm_test.py::TestAPIServerToolCall_1::test_api_tool_calls	✅	20ms
tests/explorer/explorer_test.py::BaseExplorerCase::test_explorer	✅	1ms
tests/explorer/explorer_test.py::TestExplorerCountdownEval::test_explorer	✅	53ms
tests/explorer/explorer_test.py::TestExplorerCountdownNoEval::test_explorer	✅	47ms
tests/explorer/explorer_test.py::TestExplorerGSM8k::test_explorer	✅	199ms
tests/explorer/scheduler_test.py::SchedulerTest::test_concurrent_operations	✅	4ms
tests/explorer/scheduler_test.py::SchedulerTest::test_get_results	✅	19ms
tests/explorer/scheduler_test.py::SchedulerTest::test_multi_step_execution	✅	4ms
tests/explorer/scheduler_test.py::SchedulerTest::test_non_repeatable_workflow	✅	4ms
tests/explorer/scheduler_test.py::SchedulerTest::test_scheduler_all_methods	✅	14ms
tests/explorer/scheduler_test.py::SchedulerTest::test_scheduler_restart_after_stop	✅	8ms
tests/explorer/scheduler_test.py::SchedulerTest::test_split_tasks	✅	7ms
tests/explorer/scheduler_test.py::SchedulerTest::test_stepwise_experience_eid	✅	4ms
tests/explorer/scheduler_test.py::SchedulerTest::test_wait_all	✅	7ms
tests/explorer/scheduler_test.py::SchedulerTest::test_wait_all_timeout_with_multi_batch	✅	13ms
tests/explorer/workflow_test.py::WorkflowTest::test_gsm8k_workflow	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_boxed_workflow	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_complex_workflow	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_eval_workflow	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_fraction_workflow	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_workflow	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_rm_gallery_workflow	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_repeatable	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_resettable	✅	1ms
tests/manager/synchronizer_test.py::TestSynchronizerExit::test_synchronizer	✅	28ms
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_0::test_synchronizer	✅	59ms
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_1::test_synchronizer	✅	65ms
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_2::test_synchronizer	✅	94ms
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_3::test_synchronizer	✅	105ms
tests/manager/synchronizer_test.py::TestNCCLBasedSynchronizer_0::test_synchronizer	✅	55ms
tests/manager/synchronizer_test.py::TestNCCLBasedSynchronizer_1::test_synchronizer	✅	56ms
tests/service/data_juicer_test.py::TestDataJuicer::test_config	✅	1ms
tests/service/data_juicer_test.py::TestDataJuicer::test_server_start	✅	21ms
tests/service/data_juicer_test.py::TestDataJuicerOperators::test_data_juicer_operators	✅	21ms
tests/trainer/trainer_test.py::BaseTrainerCase::test_trainer	✅	1ms
tests/trainer/trainer_test.py::TestTrainerCountdown::test_trainer	✅	135ms
tests/trainer/trainer_test.py::TestStepAheadAsyncRL::test_trainer	✅	57ms
tests/trainer/trainer_test.py::TestTrainerGSM8K::test_trainer	✅	49ms
tests/trainer/trainer_test.py::TestTrainerSFTWarmupGSM8K::test_trainer	✅	54ms
tests/trainer/trainer_test.py::TestTrainerDPO::test_trainer	✅	33ms
tests/trainer/trainer_test.py::TestTrainerSFT::test_trainer	✅	30ms
tests/trainer/trainer_test.py::TestFullyAsyncMode::test_fully_async_mode_0_queue	✅	67ms
tests/trainer/trainer_test.py::TestFullyAsyncMode::test_fully_async_mode_1_priority_queue	✅	62ms
tests/utils/eval_utils_test.py::TestMathEvalUtils::test_extract_answer	✅	1ms
tests/utils/eval_utils_test.py::TestMathEvalUtils::test_verify_math_answer	✅	1ms
tests/utils/eval_utils_test.py::TestEvalUtils::test_is_equiv	✅	1ms
tests/utils/plugin_test.py::TestPluginLoader::test_load_plugins	✅	4ms

Github Test Reporter by CTRF 💚

add data juicer service

a18ea3b

gemini-code-assist bot reviewed Aug 12, 2025

View reviewed changes

trinity/buffer/operators/data_juicer_operator.py Outdated Show resolved Hide resolved

trinity/service/data_juicer/client.py Outdated Show resolved Hide resolved

trinity/service/data_juicer/client.py Outdated Show resolved Hide resolved

add conversion test

1f6230d

add server

1851f6f

pan-x-c added 6 commits August 12, 2025 20:04

add server implementation

4317027

simplify config

9f9171f

add data_juicer test

0a8e695

support auto start dj service

5f66605

add tests for data-juicer operator

1e4f858

merge dev

739f59e

pan-x-c added 2 commits August 14, 2025 12:15

add more tests

535ae19

fix tests

8a19838

pan-x-c changed the title ~~[WIP] Add Data-Juicer Service~~ Add Data-Juicer Service Aug 14, 2025

pan-x-c requested a review from Copilot August 14, 2025 06:20

Copilot AI reviewed Aug 14, 2025

View reviewed changes

pan-x-c added 2 commits August 14, 2025 14:32

fix comments

99a5e8d

update doc

b4f779c

set np in dj operator

391e76e

HYLcool approved these changes Aug 14, 2025

View reviewed changes

simplify

a27040a

pan-x-c merged commit 459065c into modelscope:feature/data_processor Aug 14, 2025
1 check passed

Add Data-Juicer Service #181

Add Data-Juicer Service #181

Uh oh!

Conversation

pan-x-c commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pan-x-c commented Aug 12, 2025

Uh oh!

github-actions bot commented Aug 12, 2025

Summary

Tests

Uh oh!

pan-x-c commented Aug 13, 2025

Uh oh!

github-actions bot commented Aug 13, 2025

Summary

Failed Tests

Tests

Uh oh!

pan-x-c commented Aug 14, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Aug 14, 2025

Summary

Tests

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pan-x-c commented Aug 12, 2025 •

edited

Loading