Akhil Soni commited on
Commit
025774a
·
0 Parent(s):

Initial commit: RhythmEnv daily planning RL environment

Browse files

A deterministic RL environment simulating daily planning and scheduling
under energy, stress, deadline, and importance constraints.

- 3 graded tasks (easy/medium/hard) with real-world scenarios
- Multi-component reward function with partial progress signals
- Baseline inference script with heuristic + LLM agent
- OpenEnv spec compliant, Docker ready

.dockerignore ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ .venv
2
+ .git
3
+ .gitignore
4
+ .env
5
+ __pycache__/
6
+ *.pyc
7
+ *.pyo
8
+ *.pyd
9
+ *.pyw
10
+ *.pyz
11
+ *.pywz
12
+ *.pyzw
13
+ *.pyzwz
.gitignore ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ .venv/
2
+ __pycache__/
3
+ *.pyc
4
+ *.pyo
5
+ *.pyd
6
+ .env
7
+ *.egg-info/
8
+ dist/
9
+ build/
README.md ADDED
@@ -0,0 +1,177 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: RhythmEnv
3
+ emoji: 🎯
4
+ colorFrom: blue
5
+ colorTo: purple
6
+ sdk: docker
7
+ app_port: 8000
8
+ tags:
9
+ - openenv
10
+ ---
11
+
12
+ # RhythmEnv — Daily Planning RL Environment
13
+
14
+ A deterministic reinforcement learning environment that simulates daily planning and execution under constraints like time, energy, deadlines, and task importance.
15
+
16
+ ## Motivation
17
+
18
+ Real-world productivity requires balancing competing priorities: urgent vs. important tasks, energy management, meeting interruptions, and deadline pressure. RhythmEnv provides a clean, deterministic simulation of these trade-offs so RL agents can learn prioritization, scheduling, and resource management skills.
19
+
20
+ ## Quick Start
21
+
22
+ ```bash
23
+ pip install openenv-core
24
+ pip install git+https://huggingface.co/spaces/openenv/rhythm_env
25
+ ```
26
+
27
+ ```python
28
+ import asyncio
29
+ from rhythm_env import RhythmEnv, RhythmAction, ActionType
30
+
31
+ async def main():
32
+ async with RhythmEnv(base_url="https://openenv-rhythm-env.hf.space") as env:
33
+ result = await env.reset(task="easy")
34
+ print(f"Energy: {result.observation.energy}")
35
+ print(f"Tasks: {[t.name for t in result.observation.tasks]}")
36
+
37
+ result = await env.step(RhythmAction(action_type=ActionType.START_TASK, task_id=0))
38
+ print(f"Reward: {result.reward}")
39
+
40
+ asyncio.run(main())
41
+ ```
42
+
43
+ ## Action Space
44
+
45
+ | Action | Parameters | Description |
46
+ |--------|-----------|-------------|
47
+ | `START_TASK` | `task_id: int` | Begin working on a new task |
48
+ | `CONTINUE_TASK` | — | Continue working on current task |
49
+ | `SWITCH_TASK` | `task_id: int` | Switch to a different task (energy penalty) |
50
+ | `TAKE_BREAK` | — | Rest to recover energy and reduce stress |
51
+
52
+ ## Observation Space
53
+
54
+ | Field | Type | Description |
55
+ |-------|------|-------------|
56
+ | `timestep` | `int` | Current 30-minute slot (0-19) |
57
+ | `energy` | `float` | Energy level (0-1) |
58
+ | `stress` | `float` | Stress level (0-1) |
59
+ | `current_task_id` | `int?` | Task being worked on, or null |
60
+ | `tasks` | `List[TaskInfo]` | All tasks with id, name, effort, progress, deadline, importance |
61
+ | `meetings` | `List[int]` | Timesteps blocked by meetings |
62
+ | `remaining_steps` | `int` | Steps left in the episode |
63
+ | `reward_breakdown` | `Dict` | Component-wise reward details |
64
+
65
+ ## Episode Design
66
+
67
+ - **1 episode = 1 workday** (20 steps of 30 minutes each)
68
+ - Agent starts with initial energy and must manage it throughout the day
69
+ - Meetings block specific timesteps (no task progress during meetings)
70
+ - Tasks have deadlines — missing them increases stress and incurs penalties
71
+
72
+ ## Environment Dynamics
73
+
74
+ **Energy** (0-1):
75
+ - Working: −0.05 per step
76
+ - Break: +0.12 per step
77
+ - Meeting: −0.03 per step
78
+ - Task switch: −0.02 penalty
79
+
80
+ **Stress** (0-1):
81
+ - Missed deadline: +0.15
82
+ - Approaching deadline (≤2 steps): +0.03
83
+ - Break: −0.08
84
+ - Task completion: −0.10
85
+
86
+ **Task Progress**: `progress_delta = 0.15 × energy` per step when working.
87
+
88
+ ## Reward Design
89
+
90
+ Multi-component reward per step (clamped to [-1, 1]):
91
+
92
+ | Component | Formula | Signal |
93
+ |-----------|---------|--------|
94
+ | Progress | `+delta × importance × 2.0` | Encourages productive work |
95
+ | Completion bonus | `+importance × 1.5` | Rewards finishing tasks |
96
+ | Stress penalty | `−stress × 0.1` | Penalizes high stress |
97
+ | Deadline miss | `−0.3` per miss | Penalizes missed deadlines |
98
+ | Switch penalty | `−0.1` | Discourages excessive switching |
99
+ | Idle penalty | `−0.05` | Penalizes doing nothing |
100
+ | Break spam | `−0.05 × max(0, consecutive−2)` | Diminishing returns on breaks |
101
+ | Mode bonus | `+0.05/0.02` | Hidden alignment bonus |
102
+
103
+ ## Tasks (3 Scenarios)
104
+
105
+ ### Task 1 — Easy (Single Priority)
106
+ - **3 tasks**: 1 high-importance (0.9), 2 low (0.3, 0.2)
107
+ - **2 meetings** (steps 3 and 11), energy starts at 0.75
108
+ - **Moderate deadlines** (steps 10-16)
109
+ - **Goal**: Complete the main task efficiently
110
+
111
+ ### Task 2 — Medium (Deadline Pressure)
112
+ - **4 tasks** with varied importance
113
+ - **2 meetings** (steps 4 and 12)
114
+ - Energy starts at 0.7, **tight deadlines** (steps 8-18)
115
+ - **Goal**: Maximize completion before deadlines
116
+
117
+ ### Task 3 — Hard (Energy Tradeoff)
118
+ - **5 tasks**: 1 deep work (effort 0.8), 4 small tasks
119
+ - **1 meeting** (step 6), energy starts at 0.4
120
+ - **Goal**: Balance rest, deep work, and small wins
121
+
122
+ ## Grader
123
+
124
+ End-of-episode score in [0.0, 1.0]:
125
+
126
+ ```
127
+ score = 0.45×completion + 0.20×deadline + 0.15×efficiency + 0.10×energy_mgmt + 0.10×stress_mgmt
128
+ ```
129
+
130
+ | Component | Calculation |
131
+ |-----------|-------------|
132
+ | Completion | Importance-weighted fraction of tasks completed |
133
+ | Deadline | Fraction of deadlines met |
134
+ | Efficiency | optimal_steps / actual_steps |
135
+ | Energy mgmt | Average energy over episode |
136
+ | Stress mgmt | 1 − average stress |
137
+
138
+ **Expected score ranges:**
139
+ - Random agent: ~0.15–0.35
140
+ - Baseline heuristic: ~0.48–0.55
141
+ - Strong agent: ~0.70–0.85
142
+
143
+ ## Setup Instructions
144
+
145
+ ### Local Development
146
+
147
+ ```bash
148
+ cd rhythm_env
149
+ pip install -e .
150
+ uvicorn server.app:app --host 0.0.0.0 --port 8000
151
+ ```
152
+
153
+ ### Docker
154
+
155
+ ```bash
156
+ docker build -t rhythm-env:latest -f server/Dockerfile .
157
+ docker run -p 8000:8000 rhythm-env:latest
158
+ ```
159
+
160
+ ### Running the Baseline
161
+
162
+ ```bash
163
+ export API_BASE_URL="https://router.huggingface.co/v1"
164
+ export MODEL_NAME="Qwen/Qwen2.5-72B-Instruct"
165
+ export HF_TOKEN="your-token"
166
+ python inference.py
167
+ ```
168
+
169
+ ## Validation
170
+
171
+ ```bash
172
+ openenv validate
173
+ ```
174
+
175
+ ## License
176
+
177
+ BSD 3-Clause License
__init__.py ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ RhythmEnv — Daily Planning RL Environment for OpenEnv.
3
+
4
+ A deterministic reinforcement learning environment that simulates daily
5
+ planning and execution under constraints like time, energy, deadlines,
6
+ and task importance.
7
+ """
8
+
9
+ from .client import RhythmEnv
10
+ from .models import ActionType, RhythmAction, RhythmObservation, RhythmState, TaskInfo
11
+
12
+ __all__ = [
13
+ "RhythmEnv",
14
+ "RhythmAction",
15
+ "RhythmObservation",
16
+ "RhythmState",
17
+ "ActionType",
18
+ "TaskInfo",
19
+ ]
client.py ADDED
@@ -0,0 +1,72 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ RhythmEnv Client.
3
+
4
+ Provides the WebSocket client for connecting to a RhythmEnv server.
5
+ """
6
+
7
+ from __future__ import annotations
8
+
9
+ from typing import Any, Dict
10
+
11
+ from openenv.core.client_types import StepResult
12
+ from openenv.core.env_client import EnvClient
13
+
14
+ # Support both package and standalone imports
15
+ try:
16
+ from .models import RhythmAction, RhythmObservation, RhythmState, TaskInfo
17
+ except ImportError:
18
+ from models import RhythmAction, RhythmObservation, RhythmState, TaskInfo
19
+
20
+
21
+ class RhythmEnv(EnvClient[RhythmAction, RhythmObservation, RhythmState]):
22
+ """
23
+ Client for the RhythmEnv Environment.
24
+
25
+ Example:
26
+ >>> async with RhythmEnv(base_url="http://localhost:8000") as client:
27
+ ... result = await client.reset(task="easy")
28
+ ... result = await client.step(RhythmAction(action_type=ActionType.START_TASK, task_id=0))
29
+ """
30
+
31
+ def _step_payload(self, action: RhythmAction) -> Dict[str, Any]:
32
+ """Serialize RhythmAction to JSON payload."""
33
+ payload: Dict[str, Any] = {"action_type": action.action_type.value}
34
+ if action.task_id is not None:
35
+ payload["task_id"] = action.task_id
36
+ return payload
37
+
38
+ def _parse_result(self, payload: Dict[str, Any]) -> StepResult[RhythmObservation]:
39
+ """Parse server response into StepResult[RhythmObservation]."""
40
+ obs_data = payload.get("observation", {})
41
+
42
+ observation = RhythmObservation(
43
+ timestep=obs_data.get("timestep", 0),
44
+ energy=obs_data.get("energy", 1.0),
45
+ stress=obs_data.get("stress", 0.0),
46
+ current_task_id=obs_data.get("current_task_id"),
47
+ tasks=[TaskInfo(**t) for t in obs_data.get("tasks", [])],
48
+ meetings=obs_data.get("meetings", []),
49
+ remaining_steps=obs_data.get("remaining_steps", 20),
50
+ reward_breakdown=obs_data.get("reward_breakdown", {}),
51
+ done=payload.get("done", False),
52
+ reward=payload.get("reward", 0.0),
53
+ metadata=obs_data.get("metadata", {}),
54
+ )
55
+
56
+ return StepResult(
57
+ observation=observation,
58
+ reward=payload.get("reward", 0.0),
59
+ done=payload.get("done", False),
60
+ )
61
+
62
+ def _parse_state(self, payload: Dict[str, Any]) -> RhythmState:
63
+ """Parse server response into RhythmState."""
64
+ return RhythmState(
65
+ episode_id=payload.get("episode_id", ""),
66
+ task_name=payload.get("task_name", ""),
67
+ timestep=payload.get("timestep", 0),
68
+ energy=payload.get("energy", 1.0),
69
+ stress=payload.get("stress", 0.0),
70
+ current_task_id=payload.get("current_task_id"),
71
+ step_count=payload.get("step_count", 0),
72
+ )
inference.py ADDED
@@ -0,0 +1,298 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ RhythmEnv Inference Script
3
+ ===================================
4
+ MANDATORY
5
+ - Before submitting, ensure the following variables are defined in your environment configuration:
6
+ API_BASE_URL The API endpoint for the LLM.
7
+ MODEL_NAME The model identifier to use for inference.
8
+ HF_TOKEN Your Hugging Face / API key.
9
+ LOCAL_IMAGE_NAME The name of the local image to use for the environment if you are using from_docker_image()
10
+
11
+ - Defaults are set only for API_BASE_URL and MODEL_NAME
12
+ (and should reflect your active inference setup):
13
+ API_BASE_URL = os.getenv("API_BASE_URL", "<your-active-endpoint>")
14
+ MODEL_NAME = os.getenv("MODEL_NAME", "<your-active-model>")
15
+
16
+ - The inference script must be named `inference.py` and placed in the root directory of the project
17
+ - Participants must use OpenAI Client for all LLM calls using above variables
18
+
19
+ STDOUT FORMAT
20
+ - The script must emit exactly three line types to stdout, in this order:
21
+
22
+ [START] task=<task_name> env=<benchmark> model=<model_name>
23
+ [STEP] step=<n> action=<action_str> reward=<0.00> done=<true|false> error=<msg|null>
24
+ [END] success=<true|false> steps=<n> score=<score> rewards=<r1,r2,...,rn>
25
+
26
+ Rules:
27
+ - One [START] line at episode begin.
28
+ - One [STEP] line per step, immediately after env.step() returns.
29
+ - One [END] line after env.close(), always emitted (even on exception).
30
+ - reward and rewards are formatted to 2 decimal places.
31
+ - done and success are lowercase booleans: true or false.
32
+ - error is the raw last_action_error string, or null if none.
33
+ - All fields on a single line with no newlines within a line.
34
+ - Each tasks should return score in [0, 1]
35
+ """
36
+
37
+ import asyncio
38
+ import os
39
+ import sys
40
+ import textwrap
41
+ from typing import List, Optional
42
+
43
+ from openai import OpenAI
44
+
45
+ # Add current directory to path for local imports
46
+ sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
47
+
48
+ from client import RhythmEnv
49
+ from models import ActionType, RhythmAction
50
+
51
+ # ---------------------------------------------------------------------------
52
+ # Configuration
53
+ # ---------------------------------------------------------------------------
54
+
55
+ IMAGE_NAME = os.getenv("IMAGE_NAME")
56
+ API_KEY = os.getenv("HF_TOKEN") or os.getenv("API_KEY")
57
+ API_BASE_URL = os.getenv("API_BASE_URL", "https://router.huggingface.co/v1")
58
+ MODEL_NAME = os.getenv("MODEL_NAME", "Qwen/Qwen2.5-72B-Instruct")
59
+ BASE_URL = os.getenv("RHYTHM_ENV_URL", "http://localhost:8000")
60
+ BENCHMARK = "rhythm_env"
61
+ TASKS = ["easy", "medium", "hard"]
62
+ MAX_STEPS = 20
63
+ SCORE_THRESHOLD = 0.1
64
+
65
+ SYSTEM_PROMPT = textwrap.dedent("""\
66
+ You are a daily planning agent. You manage tasks across a workday.
67
+ Each step is a 30-minute slot. You have energy (0-1) and stress (0-1).
68
+
69
+ Available actions (respond with EXACTLY one line in this format):
70
+ START_TASK <task_id>
71
+ CONTINUE_TASK
72
+ SWITCH_TASK <task_id>
73
+ TAKE_BREAK
74
+
75
+ Rules:
76
+ - START_TASK/SWITCH_TASK require a task_id (integer).
77
+ - CONTINUE_TASK continues your current task.
78
+ - TAKE_BREAK recovers energy and reduces stress.
79
+ - Take breaks when energy < 0.3.
80
+ - Prioritize tasks by deadline urgency, then importance.
81
+ - Avoid unnecessary switching (costs energy and reward).
82
+
83
+ Respond with ONLY the action line, nothing else.""")
84
+
85
+
86
+ # ---------------------------------------------------------------------------
87
+ # Logging helpers
88
+ # ---------------------------------------------------------------------------
89
+
90
+ def log_start(task: str, env: str, model: str) -> None:
91
+ print(f"[START] task={task} env={env} model={model}", flush=True)
92
+
93
+
94
+ def log_step(step: int, action: str, reward: float, done: bool, error: Optional[str]) -> None:
95
+ error_val = error if error else "null"
96
+ done_val = str(done).lower()
97
+ print(
98
+ f"[STEP] step={step} action={action} reward={reward:.2f} done={done_val} error={error_val}",
99
+ flush=True,
100
+ )
101
+
102
+
103
+ def log_end(success: bool, steps: int, score: float, rewards: List[float]) -> None:
104
+ rewards_str = ",".join(f"{r:.2f}" for r in rewards)
105
+ print(
106
+ f"[END] success={str(success).lower()} steps={steps} score={score:.3f} rewards={rewards_str}",
107
+ flush=True,
108
+ )
109
+
110
+
111
+ # ---------------------------------------------------------------------------
112
+ # Heuristic action selection (enhanced by LLM)
113
+ # ---------------------------------------------------------------------------
114
+
115
+ def choose_action_heuristic(obs) -> RhythmAction:
116
+ """Greedy heuristic: prioritize by deadline then importance."""
117
+ energy = obs.energy
118
+ current_task_id = obs.current_task_id
119
+ tasks = obs.tasks
120
+ timestep = obs.timestep
121
+ meetings = obs.meetings
122
+
123
+ # During meeting slots, just take a break
124
+ if timestep in meetings:
125
+ return RhythmAction(action_type=ActionType.TAKE_BREAK)
126
+
127
+ # Take break if energy is low
128
+ if energy < 0.3:
129
+ return RhythmAction(action_type=ActionType.TAKE_BREAK)
130
+
131
+ # Get uncompleted tasks
132
+ uncompleted = [t for t in tasks if t.progress < t.effort]
133
+ if not uncompleted:
134
+ return RhythmAction(action_type=ActionType.TAKE_BREAK)
135
+
136
+ # Sort by deadline (ascending), then importance (descending)
137
+ uncompleted.sort(key=lambda t: (t.deadline, -t.importance))
138
+
139
+ # Check for urgent tasks (deadline within 3 steps)
140
+ urgent = [t for t in uncompleted if t.deadline - timestep <= 3]
141
+ best = urgent[0] if urgent else uncompleted[0]
142
+
143
+ if current_task_id is not None and current_task_id == best.id:
144
+ return RhythmAction(action_type=ActionType.CONTINUE_TASK)
145
+ elif current_task_id is not None:
146
+ return RhythmAction(action_type=ActionType.SWITCH_TASK, task_id=best.id)
147
+ else:
148
+ return RhythmAction(action_type=ActionType.START_TASK, task_id=best.id)
149
+
150
+
151
+ def choose_action_llm(obs, llm_client: OpenAI) -> RhythmAction:
152
+ """Use LLM to pick an action, fall back to heuristic on failure."""
153
+ tasks_desc = "\n".join(
154
+ f" Task {t.id}: {t.name} — {t.description}\n"
155
+ f" (effort={t.effort:.2f}, progress={t.progress:.2f}, "
156
+ f"deadline=step {t.deadline}, importance={t.importance})"
157
+ for t in obs.tasks
158
+ )
159
+ user_prompt = textwrap.dedent(f"""\
160
+ Step: {obs.timestep}/{MAX_STEPS}
161
+ Energy: {obs.energy:.2f}
162
+ Stress: {obs.stress:.2f}
163
+ Current task: {obs.current_task_id}
164
+ Meetings at steps: {obs.meetings}
165
+ Remaining steps: {obs.remaining_steps}
166
+
167
+ Tasks:
168
+ {tasks_desc}
169
+
170
+ Choose your action:""")
171
+
172
+ try:
173
+ completion = llm_client.chat.completions.create(
174
+ model=MODEL_NAME,
175
+ messages=[
176
+ {"role": "system", "content": SYSTEM_PROMPT},
177
+ {"role": "user", "content": user_prompt},
178
+ ],
179
+ temperature=0.3,
180
+ max_tokens=30,
181
+ stream=False,
182
+ )
183
+ text = (completion.choices[0].message.content or "").strip()
184
+ return parse_llm_action(text, obs)
185
+ except Exception:
186
+ return choose_action_heuristic(obs)
187
+
188
+
189
+ def parse_llm_action(text: str, obs) -> RhythmAction:
190
+ """Parse LLM response text into a RhythmAction."""
191
+ text = text.strip().upper()
192
+
193
+ if text.startswith("TAKE_BREAK"):
194
+ return RhythmAction(action_type=ActionType.TAKE_BREAK)
195
+
196
+ if text.startswith("CONTINUE_TASK"):
197
+ if obs.current_task_id is not None:
198
+ return RhythmAction(action_type=ActionType.CONTINUE_TASK)
199
+ return choose_action_heuristic(obs)
200
+
201
+ for prefix, action_type in [
202
+ ("START_TASK", ActionType.START_TASK),
203
+ ("SWITCH_TASK", ActionType.SWITCH_TASK),
204
+ ]:
205
+ if text.startswith(prefix):
206
+ rest = text[len(prefix):].strip()
207
+ try:
208
+ task_id = int(rest)
209
+ if 0 <= task_id < len(obs.tasks):
210
+ return RhythmAction(action_type=action_type, task_id=task_id)
211
+ except ValueError:
212
+ pass
213
+
214
+ # Fallback
215
+ return choose_action_heuristic(obs)
216
+
217
+
218
+ # ---------------------------------------------------------------------------
219
+ # Main loop
220
+ # ---------------------------------------------------------------------------
221
+
222
+ async def run_task(task_name: str, llm_client: OpenAI) -> float:
223
+ """Run a single task and return the score."""
224
+ if IMAGE_NAME:
225
+ env = await RhythmEnv.from_docker_image(IMAGE_NAME)
226
+ else:
227
+ env = RhythmEnv(base_url=BASE_URL)
228
+
229
+ rewards: List[float] = []
230
+ steps_taken = 0
231
+ score = 0.0
232
+ success = False
233
+
234
+ log_start(task=task_name, env=BENCHMARK, model=MODEL_NAME)
235
+
236
+ try:
237
+ async with env:
238
+ result = await env.reset(task=task_name)
239
+
240
+ for step in range(1, MAX_STEPS + 1):
241
+ if result.done:
242
+ break
243
+
244
+ # Use LLM if available, otherwise heuristic
245
+ if llm_client is not None:
246
+ action = choose_action_llm(result.observation, llm_client)
247
+ else:
248
+ action = choose_action_heuristic(result.observation)
249
+
250
+ action_str = action.action_type.value
251
+ if action.task_id is not None:
252
+ action_str += f"({action.task_id})"
253
+
254
+ result = await env.step(action)
255
+
256
+ reward = result.reward or 0.0
257
+ done = result.done
258
+ rewards.append(reward)
259
+ steps_taken = step
260
+
261
+ log_step(step=step, action=action_str, reward=reward, done=done, error=None)
262
+
263
+ if done:
264
+ break
265
+
266
+ # Get final score from grader
267
+ score = result.observation.reward_breakdown.get("final_score", 0.0)
268
+ score = max(0.0, min(1.0, score))
269
+ success = score >= SCORE_THRESHOLD
270
+
271
+ except Exception as e:
272
+ print(f"[DEBUG] Error running task {task_name}: {e}", flush=True)
273
+ finally:
274
+ try:
275
+ await env.close()
276
+ except Exception as e:
277
+ print(f"[DEBUG] env.close() error: {e}", flush=True)
278
+ log_end(success=success, steps=steps_taken, score=score, rewards=rewards)
279
+
280
+ return score
281
+
282
+
283
+ async def main() -> None:
284
+ llm_client = None
285
+ if API_KEY:
286
+ llm_client = OpenAI(base_url=API_BASE_URL, api_key=API_KEY)
287
+
288
+ scores = []
289
+ for task_name in TASKS:
290
+ s = await run_task(task_name, llm_client)
291
+ scores.append(s)
292
+
293
+ avg = sum(scores) / len(scores) if scores else 0.0
294
+ print(f"\n[SUMMARY] avg_score={avg:.3f} scores={','.join(f'{s:.3f}' for s in scores)}", flush=True)
295
+
296
+
297
+ if __name__ == "__main__":
298
+ asyncio.run(main())
models.py ADDED
@@ -0,0 +1,103 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Data models for RhythmEnv Environment.
3
+
4
+ Defines the Action, Observation, and State types for the daily planning
5
+ and scheduling RL environment.
6
+ """
7
+
8
+ from __future__ import annotations
9
+
10
+ from enum import Enum
11
+ from typing import Dict, List, Optional
12
+
13
+ from openenv.core.env_server import Action, Observation, State
14
+ from pydantic import BaseModel, Field
15
+
16
+
17
+ class ActionType(str, Enum):
18
+ """Available action types for the agent."""
19
+
20
+ START_TASK = "start_task"
21
+ CONTINUE_TASK = "continue_task"
22
+ SWITCH_TASK = "switch_task"
23
+ TAKE_BREAK = "take_break"
24
+
25
+
26
+ class RhythmAction(Action):
27
+ """
28
+ Action for RhythmEnv.
29
+
30
+ Attributes:
31
+ action_type: The type of action to perform.
32
+ task_id: Task index (required for START_TASK and SWITCH_TASK).
33
+ """
34
+
35
+ action_type: ActionType
36
+ task_id: Optional[int] = None
37
+
38
+
39
+ class TaskInfo(BaseModel):
40
+ """
41
+ Information about a single task visible to the agent.
42
+
43
+ Attributes:
44
+ id: Unique task identifier.
45
+ name: Human-readable task name.
46
+ description: Brief description of what the task involves.
47
+ effort: Total work required (0-1 scale).
48
+ progress: Work completed so far (0 to effort).
49
+ deadline: Timestep by which task should be done.
50
+ importance: How important this task is (0-1).
51
+ """
52
+
53
+ id: int
54
+ name: str
55
+ description: str = ""
56
+ effort: float
57
+ progress: float
58
+ deadline: int
59
+ importance: float
60
+
61
+
62
+ class RhythmObservation(Observation):
63
+ """
64
+ Observation for RhythmEnv.
65
+
66
+ Attributes:
67
+ timestep: Current 30-minute slot (0-19).
68
+ energy: Agent energy level (0-1).
69
+ stress: Agent stress level (0-1).
70
+ current_task_id: ID of task currently being worked on, or None.
71
+ tasks: List of all tasks with current progress.
72
+ meetings: Timesteps blocked by meetings.
73
+ remaining_steps: Steps left in the episode.
74
+ reward_breakdown: Component-wise reward details.
75
+ """
76
+
77
+ timestep: int = 0
78
+ energy: float = 1.0
79
+ stress: float = 0.0
80
+ current_task_id: Optional[int] = None
81
+ tasks: List[TaskInfo] = Field(default_factory=list)
82
+ meetings: List[int] = Field(default_factory=list)
83
+ remaining_steps: int = 20
84
+ reward_breakdown: Dict[str, float] = Field(default_factory=dict)
85
+
86
+
87
+ class RhythmState(State):
88
+ """
89
+ State for RhythmEnv.
90
+
91
+ Attributes:
92
+ task_name: Name of the current scenario (easy/medium/hard).
93
+ timestep: Current 30-minute slot.
94
+ energy: Agent energy level.
95
+ stress: Agent stress level.
96
+ current_task_id: ID of task currently being worked on.
97
+ """
98
+
99
+ task_name: str = ""
100
+ timestep: int = 0
101
+ energy: float = 1.0
102
+ stress: float = 0.0
103
+ current_task_id: Optional[int] = None
openenv.yaml ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ spec_version: 1
2
+ name: rhythm_env
3
+ type: space
4
+ runtime: fastapi
5
+ app: server.app:app
6
+ port: 8000
pyproject.toml ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright (c) Meta Platforms, Inc. and affiliates.
2
+ # All rights reserved.
3
+ #
4
+ # This source code is licensed under the BSD-style license found in the
5
+ # LICENSE file in the root directory of this source tree.
6
+
7
+ [build-system]
8
+ requires = ["setuptools>=45", "wheel"]
9
+ build-backend = "setuptools.build_meta"
10
+
11
+ [project]
12
+ name = "openenv-rhythm-env"
13
+ version = "0.1.0"
14
+ description = "RhythmEnv - Daily Planning RL Environment for OpenEnv"
15
+ requires-python = ">=3.10"
16
+ dependencies = [
17
+ "openenv-core[core]>=0.2.2",
18
+ "fastapi>=0.115.0",
19
+ "pydantic>=2.0.0",
20
+ "uvicorn>=0.24.0",
21
+ "requests>=2.31.0",
22
+ ]
23
+
24
+ [project.optional-dependencies]
25
+ dev = [
26
+ "pytest>=8.0.0",
27
+ "pytest-cov>=4.0.0",
28
+ ]
29
+
30
+ [project.scripts]
31
+ server = "rhythm_env.server.app:main"
32
+
33
+ [tool.setuptools]
34
+ include-package-data = true
35
+ packages = ["rhythm_env", "rhythm_env.server"]
36
+ package-dir = { "rhythm_env" = ".", "rhythm_env.server" = "server" }
server/Dockerfile ADDED
@@ -0,0 +1,47 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ARG BASE_IMAGE=ghcr.io/meta-pytorch/openenv-base:latest
2
+ FROM ${BASE_IMAGE} AS builder
3
+
4
+ WORKDIR /app
5
+
6
+ COPY . /app/env
7
+
8
+ WORKDIR /app/env
9
+
10
+ RUN if ! command -v uv >/dev/null 2>&1; then \
11
+ curl -LsSf https://astral.sh/uv/install.sh | sh && \
12
+ mv /root/.local/bin/uv /usr/local/bin/uv && \
13
+ mv /root/.local/bin/uvx /usr/local/bin/uvx; \
14
+ fi
15
+
16
+ RUN apt-get update && apt-get install -y --no-install-recommends \
17
+ git \
18
+ && rm -rf /var/lib/apt/lists/*
19
+
20
+ RUN --mount=type=cache,target=/root/.cache/uv \
21
+ if [ -f uv.lock ]; then \
22
+ uv sync --frozen --no-install-project --no-editable; \
23
+ else \
24
+ uv sync --no-install-project --no-editable; \
25
+ fi
26
+
27
+ RUN --mount=type=cache,target=/root/.cache/uv \
28
+ if [ -f uv.lock ]; then \
29
+ uv sync --frozen --no-editable; \
30
+ else \
31
+ uv sync --no-editable; \
32
+ fi
33
+
34
+ FROM ${BASE_IMAGE}
35
+
36
+ WORKDIR /app
37
+
38
+ COPY --from=builder /app/env/.venv /app/.venv
39
+ COPY --from=builder /app/env /app/env
40
+
41
+ ENV PATH="/app/.venv/bin:$PATH"
42
+ ENV PYTHONPATH="/app/env:$PYTHONPATH"
43
+
44
+ HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
45
+ CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:8000/health')" || exit 1
46
+
47
+ CMD ["sh", "-c", "cd /app/env && uvicorn server.app:app --host 0.0.0.0 --port 8000"]
server/__init__.py ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ """RhythmEnv environment server components."""
2
+
3
+ from .rhythm_environment import RhythmEnvironment
4
+
5
+ __all__ = ["RhythmEnvironment"]
server/app.py ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ FastAPI application for the RhythmEnv Environment.
3
+
4
+ This module creates an HTTP server that exposes the RhythmEnvironment
5
+ over HTTP and WebSocket endpoints, compatible with EnvClient.
6
+
7
+ Endpoints:
8
+ - POST /reset: Reset the environment
9
+ - POST /step: Execute an action
10
+ - GET /state: Get current environment state
11
+ - GET /schema: Get action/observation schemas
12
+ - WS /ws: WebSocket endpoint for persistent sessions
13
+
14
+ Usage:
15
+ # Development (with auto-reload):
16
+ uvicorn server.app:app --reload --host 0.0.0.0 --port 8000
17
+
18
+ # Production:
19
+ uvicorn server.app:app --host 0.0.0.0 --port 8000 --workers 4
20
+
21
+ # Or run directly:
22
+ python -m server.app
23
+ """
24
+
25
+ try:
26
+ from openenv.core.env_server.http_server import create_app
27
+ except Exception as e: # pragma: no cover
28
+ raise ImportError(
29
+ "openenv is required for the web interface. Install dependencies with '\n uv sync\n'"
30
+ ) from e
31
+
32
+ try:
33
+ from ..models import RhythmAction, RhythmObservation
34
+ from .rhythm_environment import RhythmEnvironment
35
+ except (ImportError, ModuleNotFoundError):
36
+ from models import RhythmAction, RhythmObservation
37
+ from server.rhythm_environment import RhythmEnvironment
38
+
39
+
40
+ # Create the app with web interface and README integration
41
+ app = create_app(
42
+ RhythmEnvironment,
43
+ RhythmAction,
44
+ RhythmObservation,
45
+ env_name="rhythm_env",
46
+ )
47
+
48
+
49
+ def main(host: str = "0.0.0.0", port: int = 8000):
50
+ """
51
+ Entry point for direct execution via uv run or python -m.
52
+
53
+ This function enables running the server without Docker:
54
+ uv run --project . server
55
+ uv run --project . server --port 8001
56
+ python -m rhythm_env.server.app
57
+
58
+ Args:
59
+ host: Host address to bind to (default: "0.0.0.0")
60
+ port: Port number to listen on (default: 8000)
61
+ """
62
+ import uvicorn
63
+
64
+ uvicorn.run(app, host=host, port=port)
65
+
66
+
67
+ if __name__ == "__main__":
68
+ main()
server/requirements.txt ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ openenv-core[core]>=0.2.2
2
+ fastapi>=0.115.0
3
+ uvicorn>=0.24.0
4
+ pydantic>=2.0.0
server/rhythm_environment.py ADDED
@@ -0,0 +1,593 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ RhythmEnv Environment Implementation.
3
+
4
+ A deterministic RL environment simulating daily planning and scheduling
5
+ under energy, stress, deadline, and importance constraints.
6
+
7
+ 1 episode = 1 day, 1 step = 30 minutes, 20 steps total.
8
+ """
9
+
10
+ from typing import Any, Dict, List, Optional, Set
11
+ from uuid import uuid4
12
+
13
+ from openenv.core.env_server import Environment
14
+ from openenv.core.env_server.types import EnvironmentMetadata
15
+
16
+ # Support both in-repo and standalone imports
17
+ try:
18
+ from ..models import (
19
+ ActionType,
20
+ RhythmAction,
21
+ RhythmObservation,
22
+ RhythmState,
23
+ TaskInfo,
24
+ )
25
+ except ImportError as e:
26
+ if "relative import" not in str(e) and "no known parent package" not in str(e):
27
+ raise
28
+ from models import (
29
+ ActionType,
30
+ RhythmAction,
31
+ RhythmObservation,
32
+ RhythmState,
33
+ TaskInfo,
34
+ )
35
+
36
+
37
+ # ---------------------------------------------------------------------------
38
+ # Task scenario configurations (all deterministic)
39
+ # ---------------------------------------------------------------------------
40
+
41
+ TASK_CONFIGS: Dict[str, Dict[str, Any]] = {
42
+ "easy": {
43
+ "scenario": "You are a marketing analyst preparing for a quarterly review. "
44
+ "Your manager needs the Q3 performance report by midday. "
45
+ "You also have routine emails and expense filing to handle.",
46
+ "tasks": [
47
+ {
48
+ "id": 0,
49
+ "name": "Q3 Performance Report",
50
+ "description": "Compile sales data, create visualizations, and write executive summary for the quarterly business review.",
51
+ "effort": 0.65,
52
+ "progress": 0.0,
53
+ "deadline": 10,
54
+ "importance": 0.9,
55
+ },
56
+ {
57
+ "id": 1,
58
+ "name": "Client Emails",
59
+ "description": "Respond to 12 pending client inquiries about pricing updates and contract renewals.",
60
+ "effort": 0.45,
61
+ "progress": 0.0,
62
+ "deadline": 13,
63
+ "importance": 0.3,
64
+ },
65
+ {
66
+ "id": 2,
67
+ "name": "Expense Filing",
68
+ "description": "Submit last month's travel receipts and categorize team expenses in the accounting system.",
69
+ "effort": 0.35,
70
+ "progress": 0.0,
71
+ "deadline": 16,
72
+ "importance": 0.2,
73
+ },
74
+ ],
75
+ "meetings": [3, 11],
76
+ "initial_energy": 0.75,
77
+ },
78
+ "medium": {
79
+ "scenario": "You are a product manager with a client pitch tomorrow. "
80
+ "The proposal and presentation deck are top priority, but you also need to "
81
+ "review a teammate's design doc and prepare meeting notes for leadership.",
82
+ "tasks": [
83
+ {
84
+ "id": 0,
85
+ "name": "Client Proposal",
86
+ "description": "Draft a 5-page proposal for the enterprise client including pricing tiers, timeline, and integration plan.",
87
+ "effort": 0.40,
88
+ "progress": 0.0,
89
+ "deadline": 8,
90
+ "importance": 0.7,
91
+ },
92
+ {
93
+ "id": 1,
94
+ "name": "Pitch Deck",
95
+ "description": "Create a 15-slide presentation with product demos, ROI projections, and competitive analysis.",
96
+ "effort": 0.35,
97
+ "progress": 0.0,
98
+ "deadline": 10,
99
+ "importance": 0.8,
100
+ },
101
+ {
102
+ "id": 2,
103
+ "name": "Design Review",
104
+ "description": "Review the UX team's redesign mockups for the dashboard. Provide written feedback on usability and alignment with product goals.",
105
+ "effort": 0.25,
106
+ "progress": 0.0,
107
+ "deadline": 14,
108
+ "importance": 0.5,
109
+ },
110
+ {
111
+ "id": 3,
112
+ "name": "Leadership Notes",
113
+ "description": "Summarize this week's sprint outcomes and blockers for the Monday leadership sync.",
114
+ "effort": 0.20,
115
+ "progress": 0.0,
116
+ "deadline": 18,
117
+ "importance": 0.4,
118
+ },
119
+ ],
120
+ "meetings": [4, 12],
121
+ "initial_energy": 0.7,
122
+ },
123
+ "hard": {
124
+ "scenario": "You are a senior engineer on a critical release day. "
125
+ "The system architecture redesign is the highest priority, but two production "
126
+ "bugs are blocking users, docs need updating, and test coverage is behind.",
127
+ "tasks": [
128
+ {
129
+ "id": 0,
130
+ "name": "Architecture Redesign",
131
+ "description": "Refactor the authentication service from monolith to microservice pattern. Requires deep focus: redesign API contracts, update database schema, and write migration scripts.",
132
+ "effort": 0.80,
133
+ "progress": 0.0,
134
+ "deadline": 16,
135
+ "importance": 0.9,
136
+ },
137
+ {
138
+ "id": 1,
139
+ "name": "Fix: Login Timeout",
140
+ "description": "Users on slow connections get a 504 timeout during OAuth handshake. Root cause is likely the retry logic in the auth middleware.",
141
+ "effort": 0.15,
142
+ "progress": 0.0,
143
+ "deadline": 6,
144
+ "importance": 0.5,
145
+ },
146
+ {
147
+ "id": 2,
148
+ "name": "Fix: CSV Export",
149
+ "description": "The data export endpoint crashes on records with Unicode characters in the notes field. Need to fix encoding in the serializer.",
150
+ "effort": 0.15,
151
+ "progress": 0.0,
152
+ "deadline": 10,
153
+ "importance": 0.4,
154
+ },
155
+ {
156
+ "id": 3,
157
+ "name": "API Documentation",
158
+ "description": "Update the REST API docs to reflect the new v3 endpoints. Add request/response examples and deprecation notices for v2.",
159
+ "effort": 0.20,
160
+ "progress": 0.0,
161
+ "deadline": 14,
162
+ "importance": 0.3,
163
+ },
164
+ {
165
+ "id": 4,
166
+ "name": "Integration Tests",
167
+ "description": "Write end-to-end tests for the payment flow covering Stripe webhook handling, refund processing, and receipt generation.",
168
+ "effort": 0.20,
169
+ "progress": 0.0,
170
+ "deadline": 18,
171
+ "importance": 0.6,
172
+ },
173
+ ],
174
+ "meetings": [6],
175
+ "initial_energy": 0.4,
176
+ },
177
+ }
178
+
179
+ # ---------------------------------------------------------------------------
180
+ # Constants
181
+ # ---------------------------------------------------------------------------
182
+
183
+ MAX_STEPS = 20
184
+ PROGRESS_RATE = 0.15
185
+ ENERGY_WORK_DRAIN = 0.05
186
+ ENERGY_BREAK_GAIN = 0.12
187
+ ENERGY_MEETING_DRAIN = 0.03
188
+ ENERGY_SWITCH_DRAIN = 0.02
189
+ STRESS_DEADLINE_MISS = 0.15
190
+ STRESS_APPROACHING = 0.03
191
+ STRESS_BREAK_RELIEF = 0.08
192
+ STRESS_COMPLETION_RELIEF = 0.1
193
+ APPROACHING_DEADLINE_WINDOW = 2
194
+ MAX_FREE_BREAKS = 2
195
+ BREAK_SPAM_PENALTY = 0.05
196
+ SWITCH_PENALTY = 0.1
197
+ IDLE_PENALTY = 0.05
198
+ DEADLINE_MISS_PENALTY = 0.3
199
+ STRESS_PENALTY_RATE = 0.1
200
+ PROGRESS_REWARD_SCALE = 2.0
201
+ COMPLETION_BONUS_SCALE = 1.5
202
+ DEEP_WORK_BONUS = 0.05
203
+ EXECUTION_BONUS = 0.02
204
+
205
+
206
+ class RhythmEnvironment(Environment):
207
+ """
208
+ Daily planning and scheduling environment.
209
+
210
+ The agent manages a set of tasks over a simulated workday, balancing
211
+ energy, stress, deadlines, and task importance.
212
+ """
213
+
214
+ SUPPORTS_CONCURRENT_SESSIONS: bool = True
215
+
216
+ def __init__(self) -> None:
217
+ super().__init__()
218
+ self._state = RhythmState()
219
+ # Internal tracking
220
+ self._tasks: List[Dict[str, Any]] = []
221
+ self._meetings: List[int] = []
222
+ self._initial_energy: float = 1.0
223
+ self._energy: float = 1.0
224
+ self._stress: float = 0.0
225
+ self._current_task_id: Optional[int] = None
226
+ self._consecutive_breaks: int = 0
227
+ self._completed_tasks: Set[int] = set()
228
+ self._missed_deadlines: Set[int] = set()
229
+ self._total_energy: float = 0.0
230
+ self._total_stress: float = 0.0
231
+ self._steps_working: int = 0
232
+ self._switch_count: int = 0
233
+ self._timestep: int = 0
234
+
235
+ def get_metadata(self) -> EnvironmentMetadata:
236
+ return EnvironmentMetadata(
237
+ name="RhythmEnv",
238
+ description=(
239
+ "A deterministic RL environment for daily planning and scheduling "
240
+ "under energy, stress, deadline, and importance constraints."
241
+ ),
242
+ version="0.1.0",
243
+ )
244
+
245
+ # ------------------------------------------------------------------
246
+ # reset
247
+ # ------------------------------------------------------------------
248
+
249
+ def reset(
250
+ self,
251
+ seed: Optional[int] = None,
252
+ episode_id: Optional[str] = None,
253
+ **kwargs: Any,
254
+ ) -> RhythmObservation:
255
+ task_name = kwargs.get("task", "easy")
256
+ if task_name not in TASK_CONFIGS:
257
+ task_name = "easy"
258
+
259
+ config = TASK_CONFIGS[task_name]
260
+
261
+ # Deep-copy tasks so mutations don't affect the template
262
+ self._tasks = [dict(t) for t in config["tasks"]]
263
+ self._meetings = list(config["meetings"])
264
+ self._initial_energy = config["initial_energy"]
265
+
266
+ # Reset state
267
+ self._energy = self._initial_energy
268
+ self._stress = 0.0
269
+ self._current_task_id = None
270
+ self._consecutive_breaks = 0
271
+ self._completed_tasks = set()
272
+ self._missed_deadlines = set()
273
+ self._total_energy = 0.0
274
+ self._total_stress = 0.0
275
+ self._steps_working = 0
276
+ self._switch_count = 0
277
+ self._timestep = 0
278
+
279
+ self._state = RhythmState(
280
+ episode_id=episode_id or str(uuid4()),
281
+ step_count=0,
282
+ task_name=task_name,
283
+ timestep=0,
284
+ energy=self._energy,
285
+ stress=self._stress,
286
+ current_task_id=None,
287
+ )
288
+
289
+ return self._make_observation(reward=0.0, done=False, reward_breakdown={})
290
+
291
+ # ------------------------------------------------------------------
292
+ # step
293
+ # ------------------------------------------------------------------
294
+
295
+ def step(
296
+ self,
297
+ action: RhythmAction,
298
+ timeout_s: Optional[float] = None,
299
+ **kwargs: Any,
300
+ ) -> RhythmObservation:
301
+ reward_breakdown: Dict[str, float] = {}
302
+ progress_delta = 0.0
303
+ completed_this_step: List[int] = []
304
+ switched = False
305
+ is_idle = False
306
+ is_meeting = self._timestep in self._meetings
307
+
308
+ # --- Meeting override ---
309
+ if is_meeting:
310
+ self._energy = max(0.0, self._energy - ENERGY_MEETING_DRAIN)
311
+ # During meetings, agent cannot work — action is ignored
312
+ else:
313
+ # --- Validate & process action ---
314
+ valid = self._validate_action(action)
315
+
316
+ if not valid:
317
+ is_idle = True
318
+ elif action.action_type == ActionType.TAKE_BREAK:
319
+ self._current_task_id = None
320
+ self._consecutive_breaks += 1
321
+ self._energy = min(1.0, self._energy + ENERGY_BREAK_GAIN)
322
+ self._stress = max(0.0, self._stress - STRESS_BREAK_RELIEF)
323
+ else:
324
+ # Reset break counter on any non-break action
325
+ self._consecutive_breaks = 0
326
+
327
+ if action.action_type == ActionType.START_TASK:
328
+ if self._current_task_id is not None and self._current_task_id != action.task_id:
329
+ switched = True
330
+ self._current_task_id = action.task_id
331
+
332
+ elif action.action_type == ActionType.SWITCH_TASK:
333
+ if self._current_task_id is not None and self._current_task_id != action.task_id:
334
+ switched = True
335
+ self._current_task_id = action.task_id
336
+
337
+ elif action.action_type == ActionType.CONTINUE_TASK:
338
+ if self._current_task_id is None:
339
+ is_idle = True
340
+
341
+ # Apply switch energy penalty
342
+ if switched:
343
+ self._energy = max(0.0, self._energy - ENERGY_SWITCH_DRAIN)
344
+ self._switch_count += 1
345
+
346
+ # Compute progress if working on a valid uncompleted task
347
+ if (
348
+ self._current_task_id is not None
349
+ and not is_idle
350
+ and self._current_task_id not in self._completed_tasks
351
+ ):
352
+ task = self._tasks[self._current_task_id]
353
+ progress_delta = PROGRESS_RATE * self._energy
354
+ task["progress"] = min(task["effort"], task["progress"] + progress_delta)
355
+
356
+ # Check completion
357
+ if task["progress"] >= task["effort"] and self._current_task_id not in self._completed_tasks:
358
+ self._completed_tasks.add(self._current_task_id)
359
+ completed_this_step.append(self._current_task_id)
360
+
361
+ self._energy = max(0.0, self._energy - ENERGY_WORK_DRAIN)
362
+ self._steps_working += 1
363
+ elif self._current_task_id is not None and self._current_task_id in self._completed_tasks:
364
+ # Working on already-completed task = idle
365
+ is_idle = True
366
+
367
+ # --- Check deadlines ---
368
+ new_missed: List[int] = []
369
+ for t in self._tasks:
370
+ tid = t["id"]
371
+ if tid not in self._completed_tasks and tid not in self._missed_deadlines:
372
+ if self._timestep > t["deadline"]:
373
+ self._missed_deadlines.add(tid)
374
+ new_missed.append(tid)
375
+ self._stress = min(1.0, self._stress + STRESS_DEADLINE_MISS)
376
+
377
+ # --- Stress from approaching deadlines ---
378
+ for t in self._tasks:
379
+ tid = t["id"]
380
+ if tid not in self._completed_tasks and tid not in self._missed_deadlines:
381
+ if 0 < t["deadline"] - self._timestep <= APPROACHING_DEADLINE_WINDOW:
382
+ self._stress = min(1.0, self._stress + STRESS_APPROACHING)
383
+
384
+ # --- Stress relief from completion ---
385
+ for _ in completed_this_step:
386
+ self._stress = max(0.0, self._stress - STRESS_COMPLETION_RELIEF)
387
+
388
+ # --- Advance timestep ---
389
+ self._timestep += 1
390
+ self._state.step_count += 1
391
+
392
+ # --- Track averages ---
393
+ self._total_energy += self._energy
394
+ self._total_stress += self._stress
395
+
396
+ # --- Compute reward ---
397
+ reward = 0.0
398
+
399
+ # Progress reward
400
+ if progress_delta > 0 and self._current_task_id is not None:
401
+ task = self._tasks[self._current_task_id]
402
+ r = progress_delta * task["importance"] * PROGRESS_REWARD_SCALE
403
+ reward += r
404
+ reward_breakdown["progress_reward"] = round(r, 4)
405
+
406
+ # Completion bonus
407
+ for tid in completed_this_step:
408
+ bonus = self._tasks[tid]["importance"] * COMPLETION_BONUS_SCALE
409
+ reward += bonus
410
+ reward_breakdown["completion_bonus"] = round(
411
+ reward_breakdown.get("completion_bonus", 0.0) + bonus, 4
412
+ )
413
+
414
+ # Stress penalty
415
+ stress_pen = -self._stress * STRESS_PENALTY_RATE
416
+ reward += stress_pen
417
+ reward_breakdown["stress_penalty"] = round(stress_pen, 4)
418
+
419
+ # Deadline miss penalty
420
+ if new_missed:
421
+ dp = -DEADLINE_MISS_PENALTY * len(new_missed)
422
+ reward += dp
423
+ reward_breakdown["deadline_penalty"] = round(dp, 4)
424
+
425
+ # Switch penalty
426
+ if switched:
427
+ reward -= SWITCH_PENALTY
428
+ reward_breakdown["switch_penalty"] = round(-SWITCH_PENALTY, 4)
429
+
430
+ # Idle penalty
431
+ if not is_meeting and is_idle:
432
+ reward -= IDLE_PENALTY
433
+ reward_breakdown["idle_penalty"] = round(-IDLE_PENALTY, 4)
434
+
435
+ # Break spam penalty
436
+ if not is_meeting and action.action_type == ActionType.TAKE_BREAK:
437
+ spam = -BREAK_SPAM_PENALTY * max(0, self._consecutive_breaks - MAX_FREE_BREAKS)
438
+ if spam < 0:
439
+ reward += spam
440
+ reward_breakdown["break_spam_penalty"] = round(spam, 4)
441
+
442
+ # Mode bonus
443
+ mode = self._compute_mode()
444
+ mode_bonus = 0.0
445
+ if mode == "deep_work":
446
+ mode_bonus = DEEP_WORK_BONUS
447
+ elif mode == "execution":
448
+ mode_bonus = EXECUTION_BONUS
449
+ if mode_bonus > 0:
450
+ reward += mode_bonus
451
+ reward_breakdown["mode_bonus"] = round(mode_bonus, 4)
452
+
453
+ # Clamp reward
454
+ reward = max(-1.0, min(1.0, round(reward, 4)))
455
+
456
+ # --- Done? ---
457
+ done = self._timestep >= MAX_STEPS
458
+
459
+ # --- Final grading ---
460
+ if done:
461
+ final_score = self._grade_episode()
462
+ reward_breakdown["final_score"] = round(final_score, 4)
463
+
464
+ # --- Update state ---
465
+ self._state.timestep = self._timestep
466
+ self._state.energy = round(self._energy, 4)
467
+ self._state.stress = round(self._stress, 4)
468
+ self._state.current_task_id = self._current_task_id
469
+
470
+ return self._make_observation(
471
+ reward=reward, done=done, reward_breakdown=reward_breakdown
472
+ )
473
+
474
+ # ------------------------------------------------------------------
475
+ # state property
476
+ # ------------------------------------------------------------------
477
+
478
+ @property
479
+ def state(self) -> RhythmState:
480
+ return self._state
481
+
482
+ # ------------------------------------------------------------------
483
+ # Helpers
484
+ # ------------------------------------------------------------------
485
+
486
+ def _validate_action(self, action: RhythmAction) -> bool:
487
+ """Return True if the action is valid given current state."""
488
+ if action.action_type in (ActionType.START_TASK, ActionType.SWITCH_TASK):
489
+ if action.task_id is None:
490
+ return False
491
+ if action.task_id < 0 or action.task_id >= len(self._tasks):
492
+ return False
493
+ if action.task_id in self._completed_tasks:
494
+ return False
495
+ if action.action_type == ActionType.CONTINUE_TASK:
496
+ if self._current_task_id is None:
497
+ return False
498
+ if self._current_task_id in self._completed_tasks:
499
+ return False
500
+ return True
501
+
502
+ def _compute_mode(self) -> str:
503
+ """Compute hidden internal mode (not exposed to agent)."""
504
+ if (
505
+ self._energy > 0.6
506
+ and self._stress < 0.3
507
+ and self._current_task_id is not None
508
+ and self._tasks[self._current_task_id]["effort"] > 0.5
509
+ ):
510
+ return "deep_work"
511
+ if (
512
+ self._energy > 0.3
513
+ and self._stress < 0.6
514
+ and self._current_task_id is not None
515
+ ):
516
+ return "execution"
517
+ return "balanced"
518
+
519
+ def _grade_episode(self) -> float:
520
+ """Compute final episode score in [0, 1]."""
521
+ # 1. Completion score (weighted by importance)
522
+ total_importance = sum(t["importance"] for t in self._tasks)
523
+ completed_importance = sum(
524
+ t["importance"]
525
+ for t in self._tasks
526
+ if t["id"] in self._completed_tasks
527
+ )
528
+ completion_score = (
529
+ completed_importance / total_importance if total_importance > 0 else 0.0
530
+ )
531
+
532
+ # 2. Deadline score
533
+ total_tasks = len(self._tasks)
534
+ deadlines_met = total_tasks - len(self._missed_deadlines)
535
+ deadline_score = deadlines_met / total_tasks if total_tasks > 0 else 0.0
536
+
537
+ # 3. Efficiency score
538
+ total_effort = sum(
539
+ t["effort"]
540
+ for t in self._tasks
541
+ if t["id"] in self._completed_tasks
542
+ )
543
+ optimal_steps = total_effort / PROGRESS_RATE if total_effort > 0 else 1.0
544
+ actual_steps = max(self._steps_working, 1)
545
+ efficiency_score = min(1.0, optimal_steps / actual_steps)
546
+
547
+ # 4. Energy management (average energy)
548
+ steps_elapsed = max(self._timestep, 1)
549
+ energy_management = self._total_energy / steps_elapsed
550
+
551
+ # 5. Stress management (1 - average stress)
552
+ stress_management = 1.0 - (self._total_stress / steps_elapsed)
553
+
554
+ score = (
555
+ 0.45 * completion_score
556
+ + 0.20 * deadline_score
557
+ + 0.15 * efficiency_score
558
+ + 0.10 * energy_management
559
+ + 0.10 * stress_management
560
+ )
561
+ return max(0.0, min(1.0, score))
562
+
563
+ def _make_observation(
564
+ self,
565
+ reward: float,
566
+ done: bool,
567
+ reward_breakdown: Dict[str, float],
568
+ ) -> RhythmObservation:
569
+ """Build the observation returned to the agent."""
570
+ task_infos = [
571
+ TaskInfo(
572
+ id=t["id"],
573
+ name=t["name"],
574
+ description=t.get("description", ""),
575
+ effort=round(t["effort"], 4),
576
+ progress=round(t["progress"], 4),
577
+ deadline=t["deadline"],
578
+ importance=t["importance"],
579
+ )
580
+ for t in self._tasks
581
+ ]
582
+ return RhythmObservation(
583
+ timestep=self._timestep,
584
+ energy=round(self._energy, 4),
585
+ stress=round(self._stress, 4),
586
+ current_task_id=self._current_task_id,
587
+ tasks=task_infos,
588
+ meetings=self._meetings,
589
+ remaining_steps=MAX_STEPS - self._timestep,
590
+ reward_breakdown=reward_breakdown,
591
+ reward=reward,
592
+ done=done,
593
+ )
uv.lock ADDED
The diff for this file is too large to render. See raw diff