·
AI & ML interests
None yet
Recent Activity
Organizations
None yet
daixuancheng/Qwen3-VL-8B-Thinking_stage1_MixAllRL
9B
•
Updated
•
3
daixuancheng/Qwen3-VL-8B-Thinking_stage3_MixAllRL_and_dataMixRatio_and_easy2hard
9B
•
Updated
daixuancheng/Qwen3-VL-8B-Thinking_stage2_MixAllRL_and_dataMixRatio
9B
•
Updated
•
2
daixuancheng/Qwen3-VL-8B-Thinking_multisub_kaiyuanTiankong_resplen8192_sp2_gentp2_step20
9B
•
Updated
•
2
daixuancheng/Qwen3-VL-8B-Thinking_multisub_kaiyuanTiankong_resplen8192_sp2_gentp2_step36
9B
•
Updated
•
17
daixuancheng/Qwen3-VL-8B-Thinking_multisub_kaiyuanTiankong_resplen8192_sp2_gentp2_step4
9B
•
Updated
•
2
daixuancheng/Qwen3-VL-8B-Thinking_multisub_kaiyuanTiankong_resplen8192_sp2_gentp2_step10
9B
•
Updated
daixuancheng/ppo_sac_static0.1_constrainbyadv_step-80_actor
Text Generation
•
8B
•
Updated
•
1
daixuancheng/ppo_sac_static0.1_constrainbyadv_step-20_actor
Text Generation
•
8B
•
Updated
daixuancheng/ppo_sac_static0.1_constrainbyadv_step-80_critic
Text Generation
•
8B
•
Updated
daixuancheng/ppo_sac_static0.1_constrainbyadv_step-20_critic
Text Generation
•
8B
•
Updated
daixuancheng/ppo_sac_static0.1_constrainbyadv_step-60_critic
Text Generation
•
8B
•
Updated
daixuancheng/ppo_sac_static0.1_constrainbyadv_step-120_critic
Text Generation
•
8B
•
Updated
•
1
daixuancheng/ppo_sample8_critic-warm10-lr2e-6_step80_crtic
Text Generation
•
8B
•
Updated
daixuancheng/ppo_sample8_critic-warm10-lr2e-6_step20_crtic
Text Generation
•
8B
•
Updated
daixuancheng/ppo_sample8_critic-warm10-lr2e-6_step120_crtic
Text Generation
•
8B
•
Updated
daixuancheng/ppo_sample8_critic-warm10-lr2e-6_step60_crtic
Text Generation
•
8B
•
Updated
daixuancheng/ppo_sac_static0.1_constrainbyadv_step-120_actor
Text Generation
•
8B
•
Updated
daixuancheng/ppo_sac_static0.1_constrainbyadv_step-60_actor
Text Generation
•
8B
•
Updated
daixuancheng/ppo_sample8_critic-warm10-lr2e-6_step120_actor
Text Generation
•
8B
•
Updated
daixuancheng/ppo_sample8_critic-warm10-lr2e-6_step60_actor
Text Generation
•
8B
•
Updated
daixuancheng/ppo_sample8_critic-warm10-lr2e-6_step20_actor
Text Generation
•
8B
•
Updated
daixuancheng/ppo_sample8_critic-warm10-lr2e-6_step80_actor
Text Generation
•
8B
•
Updated
daixuancheng/sac_static0.4_constrainbyAdv_step80
Text Generation
•
8B
•
Updated
daixuancheng/zero_7b_base_useTokenLoss_clipHigh_KLcoeff0_step80
Text Generation
•
8B
•
Updated
daixuancheng/sac_static0.1_constrainbyAdv_step80
Text Generation
•
8B
•
Updated
daixuancheng/sac_static0.1_constrainbyAdv_step120
Text Generation
•
8B
•
Updated
daixuancheng/sac_static0.4_constrainbyAdv_step60
Text Generation
•
8B
•
Updated
daixuancheng/zero_7b_base_useTokenLoss_clipHigh_KLcoeff0_step60
Text Generation
•
8B
•
Updated
daixuancheng/sac_static0.1_constrainbyAdv_step60
Text Generation
•
8B
•
Updated