Add `max_prompt_tokens` #202

chenyushuo · 2025-08-20T04:36:36Z

Description

[Please describe the background, purpose, changes made, and how to test this PR]

Checklist

Please check the following items before code is ready to be reviewed.

Code has passed all tests
Docstrings have been added/updated in Google Style
Documentation has been updated
Code is ready for review

gemini-code-assist

Summary of Changes

Hello @chenyushuo, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request reintroduces and properly implements the max_prompt_tokens configuration to manage the length of input prompts for language models. The changes ensure that prompts are truncated to fit within the model's context window, improving stability and preventing errors related to overly long inputs. It also establishes a default maximum model length and intelligently derives related token limits, making the system more robust in handling various model configurations.

Highlights

Re-enabling max_prompt_tokens: The max_prompt_tokens parameter, previously marked as deprecated, has been fully re-enabled and integrated into the model configuration and inference process.
Dynamic max_model_len determination: New logic has been introduced to dynamically determine the max_model_len by attempting to retrieve it from the model's tokenizer. If unsuccessful, a new default MAX_MODEL_LEN constant of 4096 is used.
Automatic token limit adjustments: The system now automatically adjusts max_prompt_tokens to be max_model_len - 1 and max_response_tokens to be max_model_len if they are not explicitly set or exceed the model's maximum length, ensuring token limits are always within bounds.
Explicit prompt truncation: Prompts are now explicitly tokenized and truncated from the left side based on the max_prompt_tokens setting before being passed to the VLLM engine for generation, preventing input prompts from exceeding the model's context window.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces support for max_prompt_tokens to handle prompt truncation. The changes are mostly in the configuration handling and the vLLM model wrapper. My review identifies a high-severity issue with silent error handling that could mask problems, and a couple of medium-severity issues related to code duplication and a potentially unused parameter, which could affect maintainability and clarity.

trinity/common/config.py

trinity/common/models/vllm_model.py

…_max_prompt_length

chenyushuo · 2025-08-20T10:03:15Z

/unittest-all

github-actions · 2025-08-20T10:30:03Z

Summary

Tests 📝	Passed ✅	Failed ❌	Skipped ⏭️	Other ❓	Flaky 🍂	Duration ⏱️
99	99	0	0	0	0	1.6s

Tests

Test Name	Status	Duration
tests/algorithm/add_strategy_test.py::TestAddStrategy::test_correct_bias_strategy	✅	1ms
tests/algorithm/add_strategy_test.py::TestAddStrategy::test_duplicate_add_strategy	✅	1ms
tests/algorithm/add_strategy_test.py::TestAddStrategy::test_grpo_args	✅	1ms
tests/algorithm/add_strategy_test.py::TestAddStrategy::test_reward_variance_strategy	✅	1ms
tests/algorithm/add_strategy_test.py::TestAddStrategy::test_step_wise_grpo_strategy	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_dpo_policy_loss	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_gspo_policy_loss	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_mix_policy_loss	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_opmd_policy_loss	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_ppo_policy_loss	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_sft_policy_loss	✅	1ms
tests/buffer/file_test.py::TestFileBuffer::test_file_buffer	✅	3ms
tests/buffer/file_test.py::TestFileBuffer::test_file_reader	✅	1ms
tests/buffer/file_test.py::TestFileBuffer::test_file_writer	✅	1ms
tests/buffer/queue_test.py::TestQueueBuffer::test_priority_queue_buffer_reuse	✅	7ms
tests/buffer/queue_test.py::TestQueueBuffer::test_priority_queue_capacity	✅	2ms
tests/buffer/queue_test.py::TestQueueBuffer::test_queue_buffer_0_queue	✅	4ms
tests/buffer/queue_test.py::TestQueueBuffer::test_queue_buffer_1_priority_queue	✅	4ms
tests/buffer/queue_test.py::TestQueueBuffer::test_queue_buffer_capacity	✅	4ms
tests/buffer/sql_test.py::TestSQLBuffer::test_create_sql_buffer	✅	4ms
tests/common/config_test.py::TestConfig::test_all_examples_are_valid	✅	2ms
tests/common/config_test.py::TestConfig::test_config_flatten	✅	1ms
tests/common/config_test.py::TestConfig::test_continue_from_checkpoint_is_valid	✅	1ms
tests/common/config_test.py::TestConfig::test_load_default_config	✅	4ms
tests/common/experience_test.py::TestEID::test_eid_properties	✅	1ms
tests/common/experience_test.py::TestExperience::test_action_mask_and_logprobs_type	✅	1ms
tests/common/experience_test.py::TestExperience::test_assertions	✅	1ms
tests/common/experience_test.py::TestExperience::test_dpo_experience	✅	1ms
tests/common/experience_test.py::TestExperience::test_gather	✅	1ms
tests/common/experience_test.py::TestExperience::test_multi_turn_experience	✅	1ms
tests/common/experience_test.py::TestExperience::test_serialize_deserialize	✅	1ms
tests/common/experience_test.py::TestExperience::test_single_turn_experience	✅	1ms
tests/common/experience_test.py::TestExperience::test_to_dict	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_batch_conversion	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_dpo_experience_batch_conversion	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_experience_model_experience_conversion	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_gather_experiences_with_custom_fields	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_multiturn_experience_batch_converstion	✅	1ms
tests/common/vllm_test.py::ModelWrapperTest_0::test_generate	✅	36ms
tests/common/vllm_test.py::ModelWrapperTest_1::test_generate	✅	15ms
tests/common/vllm_test.py::ModelWrapperTest_2::test_generate	✅	16ms
tests/common/vllm_test.py::ModelWrapperTest_3::test_generate	✅	53ms
tests/common/vllm_test.py::ModelWrapperTest_4::test_generate	✅	49ms
tests/common/vllm_test.py::ModelWrapperTest_5::test_generate	✅	36ms
tests/common/vllm_test.py::ModelWrapperTest_6::test_generate	✅	46ms
tests/common/vllm_test.py::TestAPIServer::test_api	✅	23ms
tests/common/vllm_test.py::TestTokenizer::test_assistant_token_mask	✅	1ms
tests/common/vllm_test.py::TestAPIServerToolCall_0_deepseek_r1::test_api_tool_calls	✅	21ms
tests/common/vllm_test.py::TestAPIServerToolCall_1::test_api_tool_calls	✅	19ms
tests/explorer/explorer_test.py::BaseExplorerCase::test_explorer	✅	1ms
tests/explorer/explorer_test.py::TestExplorerCountdownEval::test_explorer	✅	44ms
tests/explorer/explorer_test.py::TestExplorerCountdownNoEval::test_explorer	✅	46ms
tests/explorer/explorer_test.py::TestExplorerWithAddStrategy::test_explorer	✅	31ms
tests/explorer/scheduler_test.py::SchedulerTest::test_concurrent_operations	✅	4ms
tests/explorer/scheduler_test.py::SchedulerTest::test_get_results	✅	19ms
tests/explorer/scheduler_test.py::SchedulerTest::test_multi_step_execution	✅	4ms
tests/explorer/scheduler_test.py::SchedulerTest::test_non_repeatable_workflow	✅	4ms
tests/explorer/scheduler_test.py::SchedulerTest::test_scheduler_all_methods	✅	14ms
tests/explorer/scheduler_test.py::SchedulerTest::test_scheduler_restart_after_stop	✅	7ms
tests/explorer/scheduler_test.py::SchedulerTest::test_split_tasks	✅	7ms
tests/explorer/scheduler_test.py::SchedulerTest::test_stepwise_experience_eid	✅	4ms
tests/explorer/scheduler_test.py::SchedulerTest::test_wait_all	✅	7ms
tests/explorer/scheduler_test.py::SchedulerTest::test_wait_all_timeout_with_multi_batch	✅	13ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_reward_propagation_workflow	✅	1ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_step_wise_reward_workflow	✅	1ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_workflows_raise_error	✅	1ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_workflows_stop_at_max_env_steps	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_gsm8k_workflow	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_boxed_workflow	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_complex_workflow	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_eval_workflow	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_fraction_workflow	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_workflow	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_rm_gallery_workflow	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_repeatable	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_resettable	✅	1ms
tests/manager/synchronizer_test.py::TestSynchronizerExit::test_synchronizer	✅	28ms
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_0::test_synchronizer	✅	61ms
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_1::test_synchronizer	✅	64ms
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_2::test_synchronizer	✅	91ms
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_3::test_synchronizer	✅	106ms
tests/manager/synchronizer_test.py::TestNCCLBasedSynchronizer_0::test_synchronizer	✅	50ms
tests/manager/synchronizer_test.py::TestNCCLBasedSynchronizer_1::test_synchronizer	✅	50ms
tests/trainer/trainer_test.py::BaseTrainerCase::test_trainer	✅	1ms
tests/trainer/trainer_test.py::TestTrainerCountdown::test_trainer	✅	143ms
tests/trainer/trainer_test.py::TestStepAheadAsyncRL::test_trainer	✅	52ms
tests/trainer/trainer_test.py::TestTrainerGSM8K::test_trainer	✅	45ms
tests/trainer/trainer_test.py::TestTrainerSFTWarmupGSM8K::test_trainer	✅	48ms
tests/trainer/trainer_test.py::TestTrainerDPO::test_trainer	✅	31ms
tests/trainer/trainer_test.py::TestTrainerSFT::test_trainer	✅	29ms
tests/trainer/trainer_test.py::TestFullyAsyncMode::test_fully_async_mode_0_queue	✅	60ms
tests/trainer/trainer_test.py::TestFullyAsyncMode::test_fully_async_mode_1_priority_queue	✅	59ms
tests/trainer/trainer_test.py::TestTrainerMIX::test_trainer	✅	54ms
tests/utils/eval_utils_test.py::TestMathEvalUtils::test_extract_answer	✅	1ms
tests/utils/eval_utils_test.py::TestMathEvalUtils::test_verify_math_answer	✅	1ms
tests/utils/eval_utils_test.py::TestEvalUtils::test_is_equiv	✅	1ms
tests/utils/plugin_test.py::TestPluginLoader::test_load_plugins_local	✅	1ms
tests/utils/plugin_test.py::TestPluginLoader::test_load_plugins_remote	✅	5ms
tests/utils/plugin_test.py::TestPluginLoader::test_passing_custom_class	✅	3ms

Github Test Reporter by CTRF 💚

Add max_prompt_tokens

3ecddaa

gemini-code-assist bot reviewed Aug 20, 2025

View reviewed changes

trinity/common/config.py Outdated Show resolved Hide resolved

trinity/common/models/vllm_model.py Show resolved Hide resolved

trinity/common/models/vllm_model.py Outdated Show resolved Hide resolved

chenyushuo added 3 commits August 20, 2025 17:25

apply suggestions from gemini

70cf148

Merge branch 'main' of github.com:modelscope/Trinity-RFT into fix/add…

eeb33bf

…_max_prompt_length

Add max_model_len to vllm_test

b537279

pan-x-c approved these changes Aug 20, 2025

View reviewed changes

pan-x-c merged commit 8bd9ac3 into modelscope:main Aug 20, 2025
1 check passed

pan-x-c mentioned this pull request Aug 21, 2025

Effect of max_prompt_length and max_response_length - seems that prompt truncation is not implemented, and this leads to vllm throws an exception: The decoder prompt (length 42861) is longer than the maximum model length of 32768 #197

Closed

yaochaorui pushed a commit to yaochaorui/Trinity-RFT that referenced this pull request Aug 27, 2025

Add max_prompt_tokens (modelscope#202)

98afb5a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add `max_prompt_tokens` #202

Add `max_prompt_tokens` #202

Uh oh!

chenyushuo commented Aug 20, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chenyushuo commented Aug 20, 2025

Uh oh!

github-actions bot commented Aug 20, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add max_prompt_tokens #202

Add max_prompt_tokens #202

Uh oh!

Conversation

chenyushuo commented Aug 20, 2025

Description

Checklist

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chenyushuo commented Aug 20, 2025

Uh oh!

github-actions bot commented Aug 20, 2025

Summary

Tests

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add `max_prompt_tokens` #202

Add `max_prompt_tokens` #202