Spaces:
Running
Running
Move blog to root as BLOG.md (per Meta mentor guidance)
Browse filesMentor tip: 'Please push Blog.MD into your HF Space' — a separate
markdown file at the root, distinct from the README, makes it obvious
to judges where the writeup lives. Updated README link accordingly.
Also removed training/WhatMakesAGoodSubmission.md (internal hackathon
notes, not for judges).
- docs/blog_post.md → BLOG.md +0 -0
- README.md +1 -1
- training/WhatMakesAGoodSubmission.md +0 -57
docs/blog_post.md → BLOG.md
RENAMED
|
File without changes
|
README.md
CHANGED
|
@@ -19,7 +19,7 @@ This is **meta-reinforcement learning** for personalization: the agent isn't tra
|
|
| 19 |
|
| 20 |
- **HF Space (the environment)**: https://huggingface.co/spaces/InosLihka/rhythm_env
|
| 21 |
- **Training notebook**: [training/RhythmEnv_GRPO_Training.ipynb](training/RhythmEnv_GRPO_Training.ipynb)
|
| 22 |
-
- **Blog post**: [
|
| 23 |
- **Headline results**: [docs/results.md](docs/results.md)
|
| 24 |
- **Trained model (Algorithm Distillation)**: https://huggingface.co/InosLihka/rhythm-env-meta-trained-sft-v1
|
| 25 |
- **Teacher trajectories dataset**: https://huggingface.co/datasets/InosLihka/rhythm-env-teacher-trajectories
|
|
|
|
| 19 |
|
| 20 |
- **HF Space (the environment)**: https://huggingface.co/spaces/InosLihka/rhythm_env
|
| 21 |
- **Training notebook**: [training/RhythmEnv_GRPO_Training.ipynb](training/RhythmEnv_GRPO_Training.ipynb)
|
| 22 |
+
- **Blog post**: [BLOG.md](BLOG.md) — *Teaching an AI to Know You (Without Asking)*
|
| 23 |
- **Headline results**: [docs/results.md](docs/results.md)
|
| 24 |
- **Trained model (Algorithm Distillation)**: https://huggingface.co/InosLihka/rhythm-env-meta-trained-sft-v1
|
| 25 |
- **Teacher trajectories dataset**: https://huggingface.co/datasets/InosLihka/rhythm-env-teacher-trajectories
|
training/WhatMakesAGoodSubmission.md
DELETED
|
@@ -1,57 +0,0 @@
|
|
| 1 |
-
# What makes a submission stand out:
|
| 2 |
-
Pick an ambitious, original problem
|
| 3 |
-
The themes (problems) are deliberately open. Use them as launching pads, not boxes. Judges have seen a lot of chess, snake, tic-tac-toe, and grid-world clones. To score well on innovation,
|
| 4 |
-
you need a genuinely fresh angle. Some questions to ask yourself:
|
| 5 |
-
Does this environment exist to teach an LLM something it currently can’t do well?
|
| 6 |
-
Is the domain underexplored in RL/LLM training?
|
| 7 |
-
Could a researcher write a paper about training on this?
|
| 8 |
-
|
| 9 |
-
Design a reward signal that actually teaches
|
| 10 |
-
A great environment has a reward function that:
|
| 11 |
-
Provides a rich, informative signal (not just 0/1 at the end)
|
| 12 |
-
Captures something hard to measure in a clever way
|
| 13 |
-
Uses OpenEnv’s Rubric system thoughtfully (composable rubrics > monolithic scoring)
|
| 14 |
-
Is hard to game; an agent that exploits the reward without solving the task should not get high scores
|
| 15 |
-
|
| 16 |
-
Show real training, end to end
|
| 17 |
-
The bar isn’t “training script exists.” The bar is “training script runs against the environment, the
|
| 18 |
-
agent learns, and you can show it.” Concretely:
|
| 19 |
-
Your training loop should connect to your environment (not a static dataset)
|
| 20 |
-
Train long enough that the curves mean something
|
| 21 |
-
Compare a trained agent vs. a random/untrained baseline; quantitative and/or qualitative
|
| 22 |
-
Include the plots and numbers in your README and writeup
|
| 23 |
-
|
| 24 |
-
Make your plots readable
|
| 25 |
-
Reviewers spend seconds, not minutes, on each plot. Help them out:
|
| 26 |
-
Label both axes (e.g. “training step” / “episode” on x, “reward” / “loss” on y) and include units where they apply
|
| 27 |
-
Save plots as .png or .jpg and commit them to the repo (don’t leave them only in a Colab cell or a deleted Wandb run) (if you ran via WANBD, please include the link to that specific run of your plots)
|
| 28 |
-
Embed the key plots in your README with a one-line caption explaining what each one shows If you have multiple runs (baseline vs. trained, ablations, etc.), put them on the same axes so the comparison is obvious
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
Tell a story, not an API doc
|
| 32 |
-
Your README, blog, and pitch should answer:
|
| 33 |
-
Problem) what capability gap or interesting domain are you targeting?
|
| 34 |
-
Environment) what does the agent see, do, and get rewarded for?
|
| 35 |
-
Results) what changed after training? Show it.
|
| 36 |
-
Why does it matter) who would care, and why?
|
| 37 |
-
|
| 38 |
-
A reviewer should be able to read your README in 3~5 minutes and want to try your
|
| 39 |
-
environment.
|
| 40 |
-
|
| 41 |
-
NOTE: If you have a video, HF post, or anything else interesting, please make sure that it’s linked
|
| 42 |
-
from your README.
|
| 43 |
-
|
| 44 |
-
|
| 45 |
-
Engineer it cleanly (table stakes)
|
| 46 |
-
Engineering quality matters less than ambition, but sloppy work hurts. Make sure you:
|
| 47 |
-
Use OpenEnv’s Environment / MCPEnvironment base classes properly
|
| 48 |
-
Respect the client / server separation (clients should never import server internals)
|
| 49 |
-
Follow the standard Gym-style API (reset, step, state)
|
| 50 |
-
Have a valid openenv.yaml manifest
|
| 51 |
-
Don’t use reserved tool names (reset, step, state, close) for MCP tools
|
| 52 |
-
|
| 53 |
-
Final Note
|
| 54 |
-
Judges are looking for environments that push the frontier of what we can train LLMs to do. Be
|
| 55 |
-
ambitious. Pick a problem you find genuinely interesting; that almost always produces better
|
| 56 |
-
work than chasing what you think judges want. Good luck.
|
| 57 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|