Fix some metrics #339

hiyuchang · 2025-10-24T07:12:49Z

Description

As the title says.

Checklist

Please check the following items before code is ready to be reviewed.

Code has passed all tests
Docstrings have been added/updated in Google Style
Documentation has been updated
Code is ready for review

gemini-code-assist · 2025-10-24T07:13:18Z

Summary of Changes

Hello @hiyuchang, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on improving the observability and robustness of the system by introducing comprehensive timing metrics across key components like the Explorer, Trainer, and DataJuicer client. It also refines existing metric handling, standardizes naming conventions, and makes the metric aggregation utility more resilient to edge cases, ultimately providing clearer performance insights and a more stable metric collection process.

Highlights

Enhanced Timing Metrics: Introduced new timing metrics across Explorer, Trainer, and DataJuicer components to precisely measure various operational durations, such as total exploration time, evaluation time, and data processing time.
Metric Name Standardization: Standardized metric names in MixSampleStrategy and its documentation, changing 'read_time' to 'time/read_experience' and 'gather_time' to 'time/gather_experience' for better clarity and consistency.
Robust Metric Aggregation: Improved the gather_metrics utility function to gracefully handle empty input lists, preventing ValueError and simplifying its usage across the codebase.
GRPO Advantage Metric Refinement: Enhanced the GRPO advantage calculation to explicitly sum and report 'skipped_count' as 'filtered_count', providing more detailed insights into filtered experiences.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request aims to fix and improve metrics, particularly timing metrics, across the system. The changes include:

Standardizing metric names for better clarity (e.g., time/read_experience).
Adding timing for operations in DataJuicerClient, Explorer, and Trainer.
Refactoring gather_metrics to handle empty lists gracefully, which simplifies calling code by removing redundant try-except blocks.

The implementation is mostly solid, especially the robust timing logic in explorer.py and the simplification in gather_metrics. However, I've identified a logical issue in the new timing implementation within trinity/trainer/trainer.py where the metric time/train_total_time does not measure the full duration between syncs as intended. My review includes a detailed comment on how to correct this to ensure the metric is accurate and consistent with other parts of the codebase. Overall, good improvements to observability.

pan-x-c · 2025-10-24T09:53:52Z

/unittest-diff

github-actions · 2025-10-24T10:35:56Z

Summary

Tests 📝	Passed ✅	Failed ❌	Skipped ⏭️	Other ❓	Flaky 🍂	Duration ⏱️
91	87	1	3	0	0	2.5s

Failed Tests

Failed Tests ❌	Fail Message
❌ tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_grpo_reward_std	The test failed in the call phase due to an assertion error

Skipped

Tests	Status
tests/explorer/workflow_test.py::TestAgentScopeWorkflowAdapter::test_adapter	skipped ⏭️
tests/trainer/trainer_test.py::TestMultiModalGRPO::test_trainer	skipped ⏭️
tests/trainer/trainer_test.py::TestMultiModalSFT::test_trainer	skipped ⏭️

Tests

Test Name	Status	Duration
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_batch_level_std_grpo	✅	1ms
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_batch_level_step_wise_grpo_advantage	✅	1ms
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_duplicate_grpo	✅	1ms
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_grpo_advantage	✅	1ms
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_grpo_correct_bias	✅	1ms
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_grpo_reward_std	❌	1ms
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_step_wise_grpo_advantage	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_dpo_policy_loss	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_gspo_policy_loss	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_mix_policy_loss	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_opmd_policy_loss	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_ppo_policy_loss	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_sft_policy_loss	✅	1ms
tests/explorer/explorer_test.py::TestExplorerCountdownEval::test_explorer	✅	58ms
tests/explorer/explorer_test.py::TestExplorerCountdownNoEval::test_explorer	✅	58ms
tests/explorer/explorer_test.py::TestExplorerGSM8k::test_explorer	✅	205ms
tests/explorer/explorer_test.py::ServeTest::test_serve	✅	69ms
tests/explorer/scheduler_test.py::SchedulerTest::test_async_workflow	✅	5ms
tests/explorer/scheduler_test.py::SchedulerTest::test_concurrent_operations	✅	5ms
tests/explorer/scheduler_test.py::SchedulerTest::test_get_results	✅	23ms
tests/explorer/scheduler_test.py::SchedulerTest::test_multi_step_execution	✅	5ms
tests/explorer/scheduler_test.py::SchedulerTest::test_non_repeatable_workflow	✅	5ms
tests/explorer/scheduler_test.py::SchedulerTest::test_scheduler_all_methods	✅	15ms
tests/explorer/scheduler_test.py::SchedulerTest::test_scheduler_restart_after_stop	✅	10ms
tests/explorer/scheduler_test.py::SchedulerTest::test_split_tasks	✅	8ms
tests/explorer/scheduler_test.py::SchedulerTest::test_stepwise_experience_eid	✅	5ms
tests/explorer/scheduler_test.py::SchedulerTest::test_wait_all	✅	8ms
tests/explorer/scheduler_test.py::SchedulerTest::test_wait_all_timeout_with_multi_batch	✅	14ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_reward_propagation_workflow_0	✅	1ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_reward_propagation_workflow_1	✅	1ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_step_wise_reward_workflow_0	✅	1ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_step_wise_reward_workflow_1	✅	1ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_workflows_raise_error	✅	1ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_workflows_stop_at_max_env_steps	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_gsm8k_workflow	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_boxed_workflow	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_complex_workflow	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_eval_workflow	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_fraction_workflow	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_workflow	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_rm_gallery_workflow	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_repeatable_0	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_repeatable_1	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_resettable_0	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_resettable_1	✅	1ms
tests/explorer/workflow_test.py::MultiTurnWorkflowTest_0::test_multi_turn_workflow	✅	19ms
tests/explorer/workflow_test.py::MultiTurnWorkflowTest_1::test_multi_turn_workflow	✅	19ms
tests/explorer/workflow_test.py::TestAgentScopeWorkflowAdapter::test_adapter	⏭️	1ms
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner	✅	1ms
tests/service/data_juicer_test.py::TestDataJuicer::test_config	✅	1ms
tests/service/data_juicer_test.py::TestDataJuicer::test_server_start	✅	21ms
tests/service/data_juicer_test.py::TestDataJuicerExperiencePipeline::test_data_juicer_operators	✅	23ms
tests/service/data_juicer_test.py::TestDataJuicerTaskPipeline::test_data_juicer_task_pipeline	✅	14ms
tests/trainer/trainer_test.py::TestTrainerCountdown_0_fsdp::test_trainer	✅	137ms
tests/trainer/trainer_test.py::TestTrainerCountdown_1_megatron::test_trainer	✅	259ms
tests/trainer/trainer_test.py::TestStepAheadAsyncRL::test_trainer	✅	53ms
tests/trainer/trainer_test.py::TestTrainerGSM8K_0_fsdp::test_trainer	✅	55ms
tests/trainer/trainer_test.py::TestTrainerGSM8K_1_fsdp2::test_trainer	✅	51ms
tests/trainer/trainer_test.py::TestTrainerGSM8K_2_fsdp::test_trainer	✅	64ms
tests/trainer/trainer_test.py::TestTrainerGSM8K_3_fsdp2::test_trainer	✅	62ms
tests/trainer/trainer_test.py::TestTrainerSFTWarmupGSM8K::test_trainer	✅	100ms
tests/trainer/trainer_test.py::TestTrainerDPO::test_trainer	✅	40ms
tests/trainer/trainer_test.py::TestTrainerSFT::test_trainer	✅	37ms
tests/trainer/trainer_test.py::TestTrainerToolsSFT::test_trainer_tools	✅	37ms
tests/trainer/trainer_test.py::TestFullyAsyncMode_0_fsdp::test_fully_async_mode	✅	82ms
tests/trainer/trainer_test.py::TestFullyAsyncMode_1_fsdp::test_fully_async_mode	✅	82ms
tests/trainer/trainer_test.py::TestFullyAsyncMode_2_megatron::test_fully_async_mode	✅	141ms
tests/trainer/trainer_test.py::TestTrainerCheckpointSave_0_fsdp::test_trainer	✅	96ms
tests/trainer/trainer_test.py::TestTrainerCheckpointSave_1_megatron::test_trainer	✅	304ms
tests/trainer/trainer_test.py::TestTrainerMIX::test_trainer	✅	57ms
tests/trainer/trainer_test.py::TestMultiModalGRPO::test_trainer	⏭️	1ms
tests/trainer/trainer_test.py::TestMultiModalSFT::test_trainer	⏭️	1ms
tests/trainer/trainer_test.py::TestTrainerLoRA::test_trainer	✅	150ms
tests/utils/eval_utils_test.py::TestComputeScore::test_both_boxed_and_equivalent	✅	1ms
tests/utils/eval_utils_test.py::TestComputeScore::test_both_boxed_and_not_equivalent	✅	1ms
tests/utils/eval_utils_test.py::TestComputeScore::test_empty_ground_truth	✅	1ms
tests/utils/eval_utils_test.py::TestComputeScore::test_empty_solution_string	✅	1ms
tests/utils/eval_utils_test.py::TestComputeScore::test_multiple_boxed_answers_in_solution	✅	1ms
tests/utils/eval_utils_test.py::TestComputeScore::test_solution_boxed_truth_raw_and_equivalent	✅	1ms
tests/utils/eval_utils_test.py::TestComputeScore::test_solution_boxed_truth_raw_and_not_equivalent	✅	1ms
tests/utils/eval_utils_test.py::TestComputeScore::test_solution_not_boxed	✅	1ms
tests/utils/eval_utils_test.py::TestComputeScore::test_solution_raw_and_ground_truth_boxed_equivalent	✅	1ms
tests/utils/eval_utils_test.py::TestMathEvalUtils::test_extract_answer	✅	1ms
tests/utils/eval_utils_test.py::TestMathEvalUtils::test_verify_math_answer	✅	1ms
tests/utils/eval_utils_test.py::TestEvalUtils::test_is_equiv	✅	1ms
tests/utils/log_test.py::LogTest::test_actor_log	✅	4ms
tests/utils/log_test.py::LogTest::test_group_by_node	✅	4ms
tests/utils/log_test.py::LogTest::test_no_actor_log	✅	1ms
tests/utils/plugin_test.py::TestPluginLoader::test_load_plugins_local	✅	1ms
tests/utils/plugin_test.py::TestPluginLoader::test_load_plugins_remote	✅	9ms
tests/utils/plugin_test.py::TestPluginLoader::test_passing_custom_class	✅	5ms

Github Test Reporter by CTRF 💚

hiyuchang · 2025-10-27T02:04:05Z

/unittest-module-algorithm

github-actions · 2025-10-27T02:05:21Z

Summary

Tests 📝	Passed ✅	Failed ❌	Skipped ⏭️	Other ❓	Flaky 🍂	Duration ⏱️
13	12	1	0	0	0	4ms

Failed Tests

Failed Tests ❌	Fail Message
❌ tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_grpo_reward_std	The test failed in the call phase due to an assertion error

Tests

Test Name	Status	Duration
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_batch_level_std_grpo	✅	1ms
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_batch_level_step_wise_grpo_advantage	✅	1ms
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_duplicate_grpo	✅	1ms
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_grpo_advantage	✅	1ms
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_grpo_correct_bias	✅	1ms
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_grpo_reward_std	❌	1ms
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_step_wise_grpo_advantage	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_dpo_policy_loss	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_gspo_policy_loss	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_mix_policy_loss	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_opmd_policy_loss	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_ppo_policy_loss	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_sft_policy_loss	✅	1ms

Github Test Reporter by CTRF 💚

hiyuchang · 2025-10-27T02:24:54Z

/unittest-module-algorithm

github-actions · 2025-10-27T02:26:08Z

Summary

Tests 📝	Passed ✅	Failed ❌	Skipped ⏭️	Other ❓	Flaky 🍂	Duration ⏱️
13	13	0	0	0	0	4ms

Tests

Test Name	Status	Duration
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_batch_level_std_grpo	✅	1ms
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_batch_level_step_wise_grpo_advantage	✅	1ms
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_duplicate_grpo	✅	1ms
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_grpo_advantage	✅	1ms
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_grpo_correct_bias	✅	1ms
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_grpo_reward_std	✅	1ms
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_step_wise_grpo_advantage	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_dpo_policy_loss	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_gspo_policy_loss	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_mix_policy_loss	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_opmd_policy_loss	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_ppo_policy_loss	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_sft_policy_loss	✅	1ms

Github Test Reporter by CTRF 💚

pan-x-c · 2025-10-27T06:45:51Z

/unittest-module-explorer

github-actions · 2025-10-27T06:56:14Z

Summary

Tests 📝	Passed ✅	Failed ❌	Skipped ⏭️	Other ❓	Flaky 🍂	Duration ⏱️
36	35	0	1	0	0	549ms

Skipped

Tests	Status
tests/explorer/workflow_test.py::TestAgentScopeWorkflowAdapter::test_adapter	skipped ⏭️

Tests

Test Name	Status	Duration
tests/explorer/explorer_test.py::TestExplorerCountdownEval::test_explorer	✅	61ms
tests/explorer/explorer_test.py::TestExplorerCountdownNoEval::test_explorer	✅	54ms
tests/explorer/explorer_test.py::TestExplorerGSM8k::test_explorer	✅	205ms
tests/explorer/explorer_test.py::ServeTest::test_serve	✅	71ms
tests/explorer/scheduler_test.py::SchedulerTest::test_async_workflow	✅	5ms
tests/explorer/scheduler_test.py::SchedulerTest::test_concurrent_operations	✅	5ms
tests/explorer/scheduler_test.py::SchedulerTest::test_get_results	✅	24ms
tests/explorer/scheduler_test.py::SchedulerTest::test_multi_step_execution	✅	5ms
tests/explorer/scheduler_test.py::SchedulerTest::test_non_repeatable_workflow	✅	5ms
tests/explorer/scheduler_test.py::SchedulerTest::test_scheduler_all_methods	✅	15ms
tests/explorer/scheduler_test.py::SchedulerTest::test_scheduler_restart_after_stop	✅	9ms
tests/explorer/scheduler_test.py::SchedulerTest::test_split_tasks	✅	8ms
tests/explorer/scheduler_test.py::SchedulerTest::test_stepwise_experience_eid	✅	6ms
tests/explorer/scheduler_test.py::SchedulerTest::test_wait_all	✅	8ms
tests/explorer/scheduler_test.py::SchedulerTest::test_wait_all_timeout_with_multi_batch	✅	14ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_reward_propagation_workflow_0	✅	1ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_reward_propagation_workflow_1	✅	1ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_step_wise_reward_workflow_0	✅	1ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_step_wise_reward_workflow_1	✅	1ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_workflows_raise_error	✅	1ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_workflows_stop_at_max_env_steps	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_gsm8k_workflow	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_boxed_workflow	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_complex_workflow	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_eval_workflow	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_fraction_workflow	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_workflow	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_rm_gallery_workflow	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_repeatable_0	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_repeatable_1	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_resettable_0	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_resettable_1	✅	1ms
tests/explorer/workflow_test.py::MultiTurnWorkflowTest_0::test_multi_turn_workflow	✅	20ms
tests/explorer/workflow_test.py::MultiTurnWorkflowTest_1::test_multi_turn_workflow	✅	18ms
tests/explorer/workflow_test.py::TestAgentScopeWorkflowAdapter::test_adapter	⏭️	1ms
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner	✅	1ms

Github Test Reporter by CTRF 💚

pan-x-c · 2025-10-27T07:15:14Z

/unittest-module-common

github-actions · 2025-10-27T07:21:46Z

Summary

Tests 📝	Passed ✅	Failed ❌	Skipped ⏭️	Other ❓	Flaky 🍂	Duration ⏱️
33	33	0	0	0	0	319ms

Tests

Test Name	Status	Duration
tests/common/config_test.py::TestConfig::test_all_examples_are_valid	✅	32ms
tests/common/config_test.py::TestConfig::test_config_flatten	✅	1ms
tests/common/config_test.py::TestConfig::test_continue_from_checkpoint_is_valid	✅	1ms
tests/common/config_test.py::TestConfig::test_default_workflow	✅	1ms
tests/common/config_test.py::TestConfig::test_load_default_config	✅	3ms
tests/common/config_test.py::TestConfig::test_max_token_len_per_gpu_set_correctly	✅	1ms
tests/common/config_test.py::TestConfig::test_update_config_from_ray_cluster	✅	1ms
tests/common/experience_test.py::TestEID::test_eid_properties	✅	1ms
tests/common/experience_test.py::TestExperience::test_action_mask_and_logprobs_type	✅	1ms
tests/common/experience_test.py::TestExperience::test_assertions	✅	1ms
tests/common/experience_test.py::TestExperience::test_dpo_experience	✅	1ms
tests/common/experience_test.py::TestExperience::test_gather	✅	1ms
tests/common/experience_test.py::TestExperience::test_hf_datasets_conversion	✅	1ms
tests/common/experience_test.py::TestExperience::test_multi_turn_experience	✅	1ms
tests/common/experience_test.py::TestExperience::test_serialize_deserialize	✅	1ms
tests/common/experience_test.py::TestExperience::test_single_turn_experience	✅	1ms
tests/common/experience_test.py::TestExperience::test_to_dict	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_batch_conversion	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_dpo_experience_batch_conversion	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_experience_model_experience_conversion	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_gather_experiences_with_custom_fields	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_multiturn_experience_batch_converstion	✅	1ms
tests/common/vllm_test.py::ModelWrapperTest_0::test_generate	✅	57ms
tests/common/vllm_test.py::ModelWrapperTest_1::test_generate	✅	36ms
tests/common/vllm_test.py::ModelWrapperTest_2::test_generate	✅	47ms
tests/common/vllm_test.py::TestModelLen_0::test_model_len	✅	21ms
tests/common/vllm_test.py::TestModelLen_1::test_model_len	✅	21ms
tests/common/vllm_test.py::TestAPIServer::test_api	✅	24ms
tests/common/vllm_test.py::TestAsyncAPIServer::test_api_async	✅	24ms
tests/common/vllm_test.py::TestTokenizer::test_action_mask	✅	1ms
tests/common/vllm_test.py::TestTokenizer::test_action_mask_with_tools	✅	1ms
tests/common/vllm_test.py::TestAPIServerToolCall_0_deepseek_r1::test_api_tool_calls	✅	22ms
tests/common/vllm_test.py::TestAPIServerToolCall_1::test_api_tool_calls	✅	20ms

Github Test Reporter by CTRF 💚

add total_time and fix metric

9469632

gemini-code-assist bot reviewed Oct 24, 2025

View reviewed changes

hiyuchang added 4 commits October 24, 2025 16:02

Merge branch 'main' into dev/fix_time_metric

3e5185a

fix comments and add explore_step

c130278

replace prefix timing_s to time

e5624fd

fix comment

1e8f46c

fix test

1de687f

fix test

53ea0eb

hiyuchang added 4 commits October 27, 2025 11:43

add time/wait_explore_step

abab760

fix comment

dcbe8a4

fix experience_pipeline time

4f6af70

use experience_pipeline as prefix

38b2ed8

pan-x-c approved these changes Oct 27, 2025

View reviewed changes

pan-x-c merged commit 4576dab into modelscope:main Oct 27, 2025
2 checks passed

Fix some metrics #339

Fix some metrics #339

Uh oh!

Conversation

hiyuchang commented Oct 24, 2025

Description

Checklist

Uh oh!

gemini-code-assist bot commented Oct 24, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

pan-x-c commented Oct 24, 2025

Uh oh!

github-actions bot commented Oct 24, 2025

Summary

Failed Tests

Skipped

Tests

Uh oh!

hiyuchang commented Oct 27, 2025

Uh oh!

github-actions bot commented Oct 27, 2025

Summary

Failed Tests

Tests

Uh oh!

hiyuchang commented Oct 27, 2025

Uh oh!

github-actions bot commented Oct 27, 2025

Summary

Tests

Uh oh!

pan-x-c commented Oct 27, 2025

Uh oh!

github-actions bot commented Oct 27, 2025

Summary

Skipped

Tests

Uh oh!

pan-x-c commented Oct 27, 2025

Uh oh!

github-actions bot commented Oct 27, 2025

Summary

Tests

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants