Update the example of human in the loop #247

HYLcool · 2025-09-01T11:53:28Z

Description

As the title says.

Checklist

Please check the following items before code is ready to be reviewed.

Code has passed all tests
Docstrings have been added/updated in Google Style
Documentation has been updated
Code is ready for review

gemini-code-assist

Summary of Changes

Hello @HYLcool, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the human-in-the-loop capabilities by introducing a comprehensive and runnable example for Direct Preference Optimization (DPO). It provides all necessary components, including data, configurations, and updated documentation, to demonstrate how human preferences can be integrated into the DPO training pipeline. Additionally, it improves the flexibility of the task pipeline and adds robustness to data processing utilities.

Highlights

New Human-in-the-Loop DPO Example: A complete example for DPO (Direct Preference Optimization) with human-in-the-loop data processing has been added, including a dedicated README, sample data, and configuration files.
Documentation Update: The tutorial documentation has been updated to reflect the new human-in-the-loop example, providing clearer guidance and links to relevant configuration files.
Task Pipeline Flexibility: The task pipeline logic has been refined to remove the restriction that it's only available in 'explore' or 'both' modes, and its output can now be derived from the trainer's experience buffer.
Robustness Improvements: Checks have been added in data processing utilities to ensure the presence of 'Fields.stats' before attempting to access it, preventing potential errors.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request updates the 'human in the loop' example by adding a new DPO example with necessary configuration, data, and documentation. The changes also include modifications to the task pipeline and configuration handling to support this new workflow. My review focuses on ensuring the new example is correct and runnable. I've found a few issues, mainly incorrect paths and links in the documentation and configuration files, which could prevent users from running the example successfully. I've also suggested some improvements to comments for better clarity.

examples/dpo_human_in_the_loop/dpo.yaml

docs/sphinx_doc/source/tutorial/example_data_functionalities.md

examples/dpo_human_in_the_loop/README.md

examples/dpo_human_in_the_loop/train_dpo.yaml

pan-x-c · 2025-09-02T11:34:30Z

/unittest-module-buffer

github-actions · 2025-09-02T11:36:21Z

Summary

Tests 📝	Passed ✅	Failed ❌	Skipped ⏭️	Other ❓	Flaky 🍂	Duration ⏱️
22	22	0	0	0	0	59ms

Tests

Test Name	Status	Duration
tests/buffer/experience_pipeline_test.py::TestExperiencePipeline::test_experience_pipeline	✅	11ms
tests/buffer/experience_storage_test.py::ExperienceStorageTest::test_sql_storage	✅	5ms
tests/buffer/file_test.py::TestFileBuffer::test_file_reader	✅	1ms
tests/buffer/file_test.py::TestFileBuffer::test_file_writer	✅	2ms
tests/buffer/formatter_test.py::TestFormatter::test_dpo_messages_formatter	✅	1ms
tests/buffer/formatter_test.py::TestFormatter::test_dpo_plaintext_formatter	✅	1ms
tests/buffer/formatter_test.py::TestFormatter::test_sft_messages_formatter	✅	1ms
tests/buffer/formatter_test.py::TestFormatter::test_sft_plaintext_formatter	✅	1ms
tests/buffer/formatter_test.py::TestFormatter::test_task_formatter	✅	1ms
tests/buffer/queue_test.py::TestQueueBuffer::test_priority_queue_buffer_reuse	✅	7ms
tests/buffer/queue_test.py::TestQueueBuffer::test_priority_queue_capacity	✅	3ms
tests/buffer/queue_test.py::TestQueueBuffer::test_queue_buffer_0_queue	✅	4ms
tests/buffer/queue_test.py::TestQueueBuffer::test_queue_buffer_1_priority_queue	✅	4ms
tests/buffer/queue_test.py::TestQueueBuffer::test_queue_buffer_capacity	✅	4ms
tests/buffer/reward_shaping_mapper_test.py::TestRewardShapingMapper::test_basic_usage	✅	1ms
tests/buffer/sql_test.py::TestSQLBuffer::test_sql_buffer_read_write	✅	3ms
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_0	✅	1ms
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_1	✅	2ms
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_2	✅	1ms
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_3	✅	2ms
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_4	✅	1ms
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_5	✅	3ms

Github Test Reporter by CTRF 💚

pan-x-c · 2025-09-02T11:37:23Z

/unittest-module-service

github-actions · 2025-09-02T11:39:20Z

Summary

Tests 📝	Passed ✅	Failed ❌	Skipped ⏭️	Other ❓	Flaky 🍂	Duration ⏱️
4	4	0	0	0	0	63ms

Tests

Test Name	Status	Duration
tests/service/data_juicer_test.py::TestDataJuicer::test_config	✅	1ms
tests/service/data_juicer_test.py::TestDataJuicer::test_server_start	✅	21ms
tests/service/data_juicer_test.py::TestDataJuicerExperiencePipeline::test_data_juicer_operators	✅	24ms
tests/service/data_juicer_test.py::TestDataJuicerTaskPipeline::test_data_juicer_task_pipeline	✅	14ms

Github Test Reporter by CTRF 💚

trinity/common/config.py

pan-x-c · 2025-09-03T10:06:02Z

/unittest-module-common

github-actions · 2025-09-03T10:12:27Z

Summary

Tests 📝	Passed ✅	Failed ❌	Skipped ⏭️	Other ❓	Flaky 🍂	Duration ⏱️
30	29	1	0	0	0	331ms

Failed Tests

Failed Tests ❌	Fail Message
❌ tests/common/config_test.py::TestConfig::test_all_examples_are_valid	The test failed in the call phase due to an exception

Tests

Test Name	Status	Duration
tests/common/config_test.py::TestConfig::test_all_examples_are_valid	❌	1ms
tests/common/config_test.py::TestConfig::test_config_flatten	✅	1ms
tests/common/config_test.py::TestConfig::test_continue_from_checkpoint_is_valid	✅	1ms
tests/common/config_test.py::TestConfig::test_load_default_config	✅	4ms
tests/common/experience_test.py::TestEID::test_eid_properties	✅	1ms
tests/common/experience_test.py::TestExperience::test_action_mask_and_logprobs_type	✅	1ms
tests/common/experience_test.py::TestExperience::test_assertions	✅	1ms
tests/common/experience_test.py::TestExperience::test_dpo_experience	✅	1ms
tests/common/experience_test.py::TestExperience::test_gather	✅	1ms
tests/common/experience_test.py::TestExperience::test_hf_datasets_conversion	✅	1ms
tests/common/experience_test.py::TestExperience::test_multi_turn_experience	✅	1ms
tests/common/experience_test.py::TestExperience::test_serialize_deserialize	✅	1ms
tests/common/experience_test.py::TestExperience::test_single_turn_experience	✅	1ms
tests/common/experience_test.py::TestExperience::test_to_dict	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_batch_conversion	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_dpo_experience_batch_conversion	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_experience_model_experience_conversion	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_gather_experiences_with_custom_fields	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_multiturn_experience_batch_converstion	✅	1ms
tests/common/vllm_test.py::ModelWrapperTest_0::test_generate	✅	38ms
tests/common/vllm_test.py::ModelWrapperTest_1::test_generate	✅	16ms
tests/common/vllm_test.py::ModelWrapperTest_2::test_generate	✅	16ms
tests/common/vllm_test.py::ModelWrapperTest_3::test_generate	✅	53ms
tests/common/vllm_test.py::ModelWrapperTest_4::test_generate	✅	49ms
tests/common/vllm_test.py::ModelWrapperTest_5::test_generate	✅	35ms
tests/common/vllm_test.py::ModelWrapperTest_6::test_generate	✅	45ms
tests/common/vllm_test.py::TestAPIServer::test_api	✅	24ms
tests/common/vllm_test.py::TestTokenizer::test_assistant_token_mask	✅	1ms
tests/common/vllm_test.py::TestAPIServerToolCall_0_deepseek_r1::test_api_tool_calls	✅	21ms
tests/common/vllm_test.py::TestAPIServerToolCall_1::test_api_tool_calls	✅	19ms

Github Test Reporter by CTRF 💚

HYLcool · 2025-09-03T11:23:13Z

/unittest-module-common

github-actions · 2025-09-03T11:29:48Z

Summary

Tests 📝	Passed ✅	Failed ❌	Skipped ⏭️	Other ❓	Flaky 🍂	Duration ⏱️
31	31	0	0	0	0	340ms

Tests

Test Name	Status	Duration
tests/common/config_test.py::TestConfig::test_all_examples_are_valid	✅	4ms
tests/common/config_test.py::TestConfig::test_config_flatten	✅	1ms
tests/common/config_test.py::TestConfig::test_continue_from_checkpoint_is_valid	✅	1ms
tests/common/config_test.py::TestConfig::test_load_default_config	✅	4ms
tests/common/experience_test.py::TestEID::test_eid_properties	✅	1ms
tests/common/experience_test.py::TestExperience::test_action_mask_and_logprobs_type	✅	1ms
tests/common/experience_test.py::TestExperience::test_assertions	✅	1ms
tests/common/experience_test.py::TestExperience::test_dpo_experience	✅	1ms
tests/common/experience_test.py::TestExperience::test_gather	✅	1ms
tests/common/experience_test.py::TestExperience::test_hf_datasets_conversion	✅	1ms
tests/common/experience_test.py::TestExperience::test_multi_turn_experience	✅	1ms
tests/common/experience_test.py::TestExperience::test_serialize_deserialize	✅	1ms
tests/common/experience_test.py::TestExperience::test_single_turn_experience	✅	1ms
tests/common/experience_test.py::TestExperience::test_to_dict	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_batch_conversion	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_dpo_experience_batch_conversion	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_experience_model_experience_conversion	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_gather_experiences_with_custom_fields	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_multiturn_experience_batch_converstion	✅	1ms
tests/common/vllm_test.py::ModelWrapperTest_0::test_generate	✅	40ms
tests/common/vllm_test.py::ModelWrapperTest_1::test_generate	✅	15ms
tests/common/vllm_test.py::ModelWrapperTest_2::test_generate	✅	16ms
tests/common/vllm_test.py::ModelWrapperTest_3::test_generate	✅	55ms
tests/common/vllm_test.py::ModelWrapperTest_4::test_generate	✅	49ms
tests/common/vllm_test.py::ModelWrapperTest_5::test_generate	✅	36ms
tests/common/vllm_test.py::ModelWrapperTest_6::test_generate	✅	46ms
tests/common/vllm_test.py::TestAPIServer::test_api	✅	24ms
tests/common/vllm_test.py::TestTokenizer::test_action_mask	✅	1ms
tests/common/vllm_test.py::TestTokenizer::test_action_mask_with_tools	✅	1ms
tests/common/vllm_test.py::TestAPIServerToolCall_0_deepseek_r1::test_api_tool_calls	✅	22ms
tests/common/vllm_test.py::TestAPIServerToolCall_1::test_api_tool_calls	✅	20ms

Github Test Reporter by CTRF 💚

HYLcool added 2 commits September 1, 2025 19:51

+ add & update example of human in the loop

141294f

* after pre-commit

dc599d9

HYLcool requested review from pan-x-c and yxdyc September 1, 2025 11:53

HYLcool self-assigned this Sep 1, 2025

HYLcool added documentation Improvements or additions to documentation enhancement New feature or request labels Sep 1, 2025

gemini-code-assist bot reviewed Sep 1, 2025

View reviewed changes

* modified according to gemini's comments

07191d5

pan-x-c reviewed Sep 3, 2025

View reviewed changes

trinity/common/config.py Outdated Show resolved Hide resolved

* modified according to xuchen's comments

eb90807

Merge branch 'refs/heads/main' into example/human_in_the_loop

22db0c7

pan-x-c approved these changes Sep 3, 2025

View reviewed changes

pan-x-c merged commit d6d3be4 into modelscope:main Sep 3, 2025
2 checks passed

HYLcool deleted the example/human_in_the_loop branch September 3, 2025 11:30

yaochaorui pushed a commit to yaochaorui/Trinity-RFT that referenced this pull request Sep 19, 2025

Update the example of human in the loop (modelscope#247)

352e96e

Update the example of human in the loop #247

Update the example of human in the loop #247

Uh oh!

Conversation

HYLcool commented Sep 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pan-x-c commented Sep 2, 2025

Uh oh!

github-actions bot commented Sep 2, 2025

Summary

Tests

Uh oh!

pan-x-c commented Sep 2, 2025

Uh oh!

github-actions bot commented Sep 2, 2025

Summary

Tests

Uh oh!

Uh oh!

pan-x-c commented Sep 3, 2025

Uh oh!

github-actions bot commented Sep 3, 2025

Summary

Failed Tests

Tests

Uh oh!

HYLcool commented Sep 3, 2025

Uh oh!

github-actions bot commented Sep 3, 2025

Summary

Tests

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

HYLcool commented Sep 1, 2025 •

edited

Loading