AI & ML interests
DeepRL, RL finetuning
Organizations
skandermoalla/qrpo-paper-llama-sft-leetcode-sandbox-temp1-ref50-offpolicy10random-sandbox
Viewer
• Updated • 27k • 14
skandermoalla/qrpo-paper-llama-sft-leetcode-sandbox-temp1-ref50-offline-sandbox
skandermoalla/qrpo-paper-mistral-sft-ultrafeedback-armorm-temp1-ref50-offpolicy2random-armorm
Viewer
• Updated • 62.1k • 28
skandermoalla/qrpo-paper-mistral-sft-ultrafeedback-armorm-temp1-ref50-offpolicy2best-armorm
Viewer
• Updated • 62.1k • 17
skandermoalla/qrpo-paper-mistral-sft-ultrafeedback-armorm-temp1-ref50-offline-armorm
Viewer
• Updated • 62.1k • 30
skandermoalla/qrpo-paper-mistral-sft-magpieair-armorm-temp1-ref50-offpolicy2random-armorm
Viewer
• Updated • 91.9k • 14
skandermoalla/qrpo-paper-mistral-sft-magpieair-armorm-temp1-ref50-offpolicy2best-armorm
Viewer
• Updated • 91.9k • 120
skandermoalla/qrpo-paper-mistral-sft-magpieair-armorm-temp1-ref50-offline-armorm
Viewer
• Updated • 91.9k • 21
skandermoalla/qrpo-paper-mistral-nosft-ultrafeedback-armorm-temp1-ref50-offpolicy2random-armorm
Viewer
• Updated • 62.5k • 61
skandermoalla/qrpo-paper-mistral-nosft-ultrafeedback-armorm-temp1-ref50-offpolicy2best-armorm
Viewer
• Updated • 62.5k • 42
skandermoalla/qrpo-paper-mistral-nosft-ultrafeedback-armorm-temp1-ref50-offline-armorm
Viewer
• Updated • 62.5k • 16
skandermoalla/qrpo-paper-mistral-nosft-magpieair-armorm-temp1-ref50-offpolicy2random-armorm
Viewer
• Updated • 99k • 66
skandermoalla/qrpo-paper-mistral-nosft-magpieair-armorm-temp1-ref50-offpolicy2best-armorm
Viewer
• Updated • 99k • 36
skandermoalla/qrpo-paper-mistral-nosft-magpieair-armorm-temp1-ref50-offline-armorm
Viewer
• Updated • 99.1k • 14
skandermoalla/qrpo-paper-llama-sft-ultrafeedback-armorm-temp1-ref50-offpolicy2random-armorm
Viewer
• Updated • 62k • 24
skandermoalla/qrpo-paper-llama-sft-ultrafeedback-armorm-temp1-ref50-offpolicy2best-armorm
Viewer
• Updated • 62k • 21
skandermoalla/qrpo-paper-llama-sft-ultrafeedback-armorm-temp1-ref50-offline-armorm
Viewer
• Updated • 62k • 15
skandermoalla/qrpo-paper-llama-sft-magpieair-armorm-temp1-ref50-offpolicy2random-armorm
Viewer
• Updated • 100k • 32
skandermoalla/qrpo-paper-llama-sft-magpieair-armorm-temp1-ref50-offpolicy2best-armorm
Viewer
• Updated • 100k • 10
skandermoalla/qrpo-paper-llama-sft-magpieair-armorm-temp1-ref50-offline-armorm
Viewer
• Updated • 100k • 22
skandermoalla/qrpo-paper-llama-nosft-ultrafeedback-armorm-temp1-ref50-offpolicy2random-armorm
Viewer
• Updated • 61.6k • 23
skandermoalla/qrpo-paper-llama-nosft-ultrafeedback-armorm-temp1-ref50-offpolicy2best-armorm
Viewer
• Updated • 61.6k • 82
skandermoalla/qrpo-paper-llama-nosft-ultrafeedback-armorm-temp1-ref50-offline-armorm
Viewer
• Updated • 61.6k • 29
skandermoalla/qrpo-paper-llama-nosft-magpieair-armorm-temp1-ref50-offpolicy2random-armorm
Viewer
• Updated • 93.8k • 9
skandermoalla/qrpo-paper-llama-nosft-magpieair-armorm-temp1-ref50-offpolicy2best-armorm
Viewer
• Updated • 93.8k • 93
skandermoalla/qrpo-paper-llama-nosft-magpieair-armorm-temp1-ref50-offline-armorm
Viewer
• Updated • 96.6k • 20
skandermoalla/qrpo-paper-llama-nosft-leetcode-sandbox-temp1-ref50-offpolicy10random-sandbox
Viewer
• Updated • 26.6k • 11
skandermoalla/qrpo-paper-llama-nosft-leetcode-sandbox-temp1-ref50-offline-sandbox