InosLihka commited on
Commit
eccca42
·
1 Parent(s): 1ba0d0e

Move blog to root as BLOG.md (per Meta mentor guidance)

Browse files

Mentor tip: 'Please push Blog.MD into your HF Space' — a separate
markdown file at the root, distinct from the README, makes it obvious
to judges where the writeup lives. Updated README link accordingly.

Also removed training/WhatMakesAGoodSubmission.md (internal hackathon
notes, not for judges).

docs/blog_post.md → BLOG.md RENAMED
File without changes
README.md CHANGED
@@ -19,7 +19,7 @@ This is **meta-reinforcement learning** for personalization: the agent isn't tra
19
 
20
  - **HF Space (the environment)**: https://huggingface.co/spaces/InosLihka/rhythm_env
21
  - **Training notebook**: [training/RhythmEnv_GRPO_Training.ipynb](training/RhythmEnv_GRPO_Training.ipynb)
22
- - **Blog post**: [docs/blog_post.md](docs/blog_post.md) — *Teaching an AI to Know You (Without Asking)*
23
  - **Headline results**: [docs/results.md](docs/results.md)
24
  - **Trained model (Algorithm Distillation)**: https://huggingface.co/InosLihka/rhythm-env-meta-trained-sft-v1
25
  - **Teacher trajectories dataset**: https://huggingface.co/datasets/InosLihka/rhythm-env-teacher-trajectories
 
19
 
20
  - **HF Space (the environment)**: https://huggingface.co/spaces/InosLihka/rhythm_env
21
  - **Training notebook**: [training/RhythmEnv_GRPO_Training.ipynb](training/RhythmEnv_GRPO_Training.ipynb)
22
+ - **Blog post**: [BLOG.md](BLOG.md) — *Teaching an AI to Know You (Without Asking)*
23
  - **Headline results**: [docs/results.md](docs/results.md)
24
  - **Trained model (Algorithm Distillation)**: https://huggingface.co/InosLihka/rhythm-env-meta-trained-sft-v1
25
  - **Teacher trajectories dataset**: https://huggingface.co/datasets/InosLihka/rhythm-env-teacher-trajectories
training/WhatMakesAGoodSubmission.md DELETED
@@ -1,57 +0,0 @@
1
- # What makes a submission stand out:
2
- Pick an ambitious, original problem
3
- The themes (problems) are deliberately open. Use them as launching pads, not boxes. Judges have seen a lot of chess, snake, tic-tac-toe, and grid-world clones. To score well on innovation,
4
- you need a genuinely fresh angle. Some questions to ask yourself:
5
- Does this environment exist to teach an LLM something it currently can’t do well?
6
- Is the domain underexplored in RL/LLM training?
7
- Could a researcher write a paper about training on this?
8
-
9
- Design a reward signal that actually teaches
10
- A great environment has a reward function that:
11
- Provides a rich, informative signal (not just 0/1 at the end)
12
- Captures something hard to measure in a clever way
13
- Uses OpenEnv’s Rubric system thoughtfully (composable rubrics > monolithic scoring)
14
- Is hard to game; an agent that exploits the reward without solving the task should not get high scores
15
-
16
- Show real training, end to end
17
- The bar isn’t “training script exists.” The bar is “training script runs against the environment, the
18
- agent learns, and you can show it.” Concretely:
19
- Your training loop should connect to your environment (not a static dataset)
20
- Train long enough that the curves mean something
21
- Compare a trained agent vs. a random/untrained baseline; quantitative and/or qualitative
22
- Include the plots and numbers in your README and writeup
23
-
24
- Make your plots readable
25
- Reviewers spend seconds, not minutes, on each plot. Help them out:
26
- Label both axes (e.g. “training step” / “episode” on x, “reward” / “loss” on y) and include units where they apply
27
- Save plots as .png or .jpg and commit them to the repo (don’t leave them only in a Colab cell or a deleted Wandb run) (if you ran via WANBD, please include the link to that specific run of your plots)
28
- Embed the key plots in your README with a one-line caption explaining what each one shows If you have multiple runs (baseline vs. trained, ablations, etc.), put them on the same axes so the comparison is obvious
29
-
30
-
31
- Tell a story, not an API doc
32
- Your README, blog, and pitch should answer:
33
- Problem) what capability gap or interesting domain are you targeting?
34
- Environment) what does the agent see, do, and get rewarded for?
35
- Results) what changed after training? Show it.
36
- Why does it matter) who would care, and why?
37
-
38
- A reviewer should be able to read your README in 3~5 minutes and want to try your
39
- environment.
40
-
41
- NOTE: If you have a video, HF post, or anything else interesting, please make sure that it’s linked
42
- from your README.
43
-
44
-
45
- Engineer it cleanly (table stakes)
46
- Engineering quality matters less than ambition, but sloppy work hurts. Make sure you:
47
- Use OpenEnv’s Environment / MCPEnvironment base classes properly
48
- Respect the client / server separation (clients should never import server internals)
49
- Follow the standard Gym-style API (reset, step, state)
50
- Have a valid openenv.yaml manifest
51
- Don’t use reserved tool names (reset, step, state, close) for MCP tools
52
-
53
- Final Note
54
- Judges are looking for environments that push the frontier of what we can train LLMs to do. Be
55
- ambitious. Pick a problem you find genuinely interesting; that almost always produces better
56
- work than chasing what you think judges want. Good luck.
57
-