Spaces:

InosLihka
/

rhythm_env

Running

Akhil Soni commited on 21 days ago

Commit

025774a

0 Parent(s):

Initial commit: RhythmEnv daily planning RL environment

A deterministic RL environment simulating daily planning and scheduling
under energy, stress, deadline, and importance constraints.

- 3 graded tasks (easy/medium/hard) with real-world scenarios
- Multi-component reward function with partial progress signals
- Baseline inference script with heuristic + LLM agent
- OpenEnv spec compliant, Docker ready

Files changed (15) hide show

.dockerignore +13 -0
.gitignore +9 -0
README.md +177 -0
__init__.py +19 -0
client.py +72 -0
inference.py +298 -0
models.py +103 -0
openenv.yaml +6 -0
pyproject.toml +36 -0
server/Dockerfile +47 -0
server/__init__.py +5 -0
server/app.py +68 -0
server/requirements.txt +4 -0
server/rhythm_environment.py +593 -0
uv.lock +0 -0

.dockerignore ADDED Viewed

	@@ -0,0 +1,13 @@

+.venv
+.git
+.gitignore
+.env
+__pycache__/
+*.pyc
+*.pyo
+*.pyd
+*.pyw
+*.pyz
+*.pywz
+*.pyzw
+*.pyzwz

.gitignore ADDED Viewed

	@@ -0,0 +1,9 @@

+.venv/
+__pycache__/
+*.pyc
+*.pyo
+*.pyd
+.env
+*.egg-info/
+dist/
+build/

README.md ADDED Viewed

	@@ -0,0 +1,177 @@

+---
+title: RhythmEnv
+emoji: 🎯
+colorFrom: blue
+colorTo: purple
+sdk: docker
+app_port: 8000
+tags:
+  - openenv
+---
+# RhythmEnv — Daily Planning RL Environment
+A deterministic reinforcement learning environment that simulates daily planning and execution under constraints like time, energy, deadlines, and task importance.
+## Motivation
+Real-world productivity requires balancing competing priorities: urgent vs. important tasks, energy management, meeting interruptions, and deadline pressure. RhythmEnv provides a clean, deterministic simulation of these trade-offs so RL agents can learn prioritization, scheduling, and resource management skills.
+## Quick Start
+```bash
+pip install openenv-core
+pip install git+https://huggingface.co/spaces/openenv/rhythm_env
+```
+```python
+import asyncio
+from rhythm_env import RhythmEnv, RhythmAction, ActionType
+async def main():
+    async with RhythmEnv(base_url="https://openenv-rhythm-env.hf.space") as env:
+        result = await env.reset(task="easy")
+        print(f"Energy: {result.observation.energy}")
+        print(f"Tasks: {[t.name for t in result.observation.tasks]}")
+        result = await env.step(RhythmAction(action_type=ActionType.START_TASK, task_id=0))
+        print(f"Reward: {result.reward}")
+asyncio.run(main())
+```
+## Action Space
+| Action | Parameters | Description |
+|--------|-----------|-------------|
+| `START_TASK` | `task_id: int` | Begin working on a new task |
+| `CONTINUE_TASK` | — | Continue working on current task |
+| `SWITCH_TASK` | `task_id: int` | Switch to a different task (energy penalty) |
+| `TAKE_BREAK` | — | Rest to recover energy and reduce stress |
+## Observation Space
+| Field | Type | Description |
+|-------|------|-------------|
+| `timestep` | `int` | Current 30-minute slot (0-19) |
+| `energy` | `float` | Energy level (0-1) |
+| `stress` | `float` | Stress level (0-1) |
+| `current_task_id` | `int?` | Task being worked on, or null |
+| `tasks` | `List[TaskInfo]` | All tasks with id, name, effort, progress, deadline, importance |
+| `meetings` | `List[int]` | Timesteps blocked by meetings |
+| `remaining_steps` | `int` | Steps left in the episode |
+| `reward_breakdown` | `Dict` | Component-wise reward details |
+## Episode Design
+- **1 episode = 1 workday** (20 steps of 30 minutes each)
+- Agent starts with initial energy and must manage it throughout the day
+- Meetings block specific timesteps (no task progress during meetings)
+- Tasks have deadlines — missing them increases stress and incurs penalties
+## Environment Dynamics
+**Energy** (0-1):
+- Working: −0.05 per step
+- Break: +0.12 per step
+- Meeting: −0.03 per step
+- Task switch: −0.02 penalty
+**Stress** (0-1):
+- Missed deadline: +0.15
+- Approaching deadline (≤2 steps): +0.03
+- Break: −0.08
+- Task completion: −0.10
+**Task Progress**: `progress_delta = 0.15 × energy` per step when working.
+## Reward Design
+Multi-component reward per step (clamped to [-1, 1]):
+| Component | Formula | Signal |
+|-----------|---------|--------|
+| Progress | `+delta × importance × 2.0` | Encourages productive work |
+| Completion bonus | `+importance × 1.5` | Rewards finishing tasks |
+| Stress penalty | `−stress × 0.1` | Penalizes high stress |
+| Deadline miss | `−0.3` per miss | Penalizes missed deadlines |
+| Switch penalty | `−0.1` | Discourages excessive switching |
+| Idle penalty | `−0.05` | Penalizes doing nothing |
+| Break spam | `−0.05 × max(0, consecutive−2)` | Diminishing returns on breaks |
+| Mode bonus | `+0.05/0.02` | Hidden alignment bonus |
+## Tasks (3 Scenarios)
+### Task 1 — Easy (Single Priority)
+- **3 tasks**: 1 high-importance (0.9), 2 low (0.3, 0.2)
+- **2 meetings** (steps 3 and 11), energy starts at 0.75
+- **Moderate deadlines** (steps 10-16)
+- **Goal**: Complete the main task efficiently
+### Task 2 — Medium (Deadline Pressure)
+- **4 tasks** with varied importance
+- **2 meetings** (steps 4 and 12)
+- Energy starts at 0.7, **tight deadlines** (steps 8-18)
+- **Goal**: Maximize completion before deadlines
+### Task 3 — Hard (Energy Tradeoff)
+- **5 tasks**: 1 deep work (effort 0.8), 4 small tasks
+- **1 meeting** (step 6), energy starts at 0.4
+- **Goal**: Balance rest, deep work, and small wins
+## Grader
+End-of-episode score in [0.0, 1.0]:
+```
+score = 0.45×completion + 0.20×deadline + 0.15×efficiency + 0.10×energy_mgmt + 0.10×stress_mgmt
+```
+| Component | Calculation |
+|-----------|-------------|
+| Completion | Importance-weighted fraction of tasks completed |
+| Deadline | Fraction of deadlines met |
+| Efficiency | optimal_steps / actual_steps |
+| Energy mgmt | Average energy over episode |
+| Stress mgmt | 1 − average stress |
+**Expected score ranges:**
+- Random agent: ~0.15–0.35
+- Baseline heuristic: ~0.48–0.55
+- Strong agent: ~0.70–0.85
+## Setup Instructions
+### Local Development
+```bash
+cd rhythm_env
+pip install -e .
+uvicorn server.app:app --host 0.0.0.0 --port 8000
+```
+### Docker
+```bash
+docker build -t rhythm-env:latest -f server/Dockerfile .
+docker run -p 8000:8000 rhythm-env:latest
+```
+### Running the Baseline
+```bash
+export API_BASE_URL="https://router.huggingface.co/v1"
+export MODEL_NAME="Qwen/Qwen2.5-72B-Instruct"
+export HF_TOKEN="your-token"
+python inference.py
+```
+## Validation
+```bash
+openenv validate
+```
+## License
+BSD 3-Clause License

__init__.py ADDED Viewed

	@@ -0,0 +1,19 @@

+"""
+RhythmEnv — Daily Planning RL Environment for OpenEnv.
+A deterministic reinforcement learning environment that simulates daily
+planning and execution under constraints like time, energy, deadlines,
+and task importance.
+"""
+from .client import RhythmEnv
+from .models import ActionType, RhythmAction, RhythmObservation, RhythmState, TaskInfo
+__all__ = [
+    "RhythmEnv",
+    "RhythmAction",
+    "RhythmObservation",
+    "RhythmState",
+    "ActionType",
+    "TaskInfo",
+]

client.py ADDED Viewed

	@@ -0,0 +1,72 @@

+"""
+RhythmEnv Client.
+Provides the WebSocket client for connecting to a RhythmEnv server.
+"""
+from __future__ import annotations
+from typing import Any, Dict
+from openenv.core.client_types import StepResult
+from openenv.core.env_client import EnvClient
+# Support both package and standalone imports
+try:
+    from .models import RhythmAction, RhythmObservation, RhythmState, TaskInfo
+except ImportError:
+    from models import RhythmAction, RhythmObservation, RhythmState, TaskInfo
+class RhythmEnv(EnvClient[RhythmAction, RhythmObservation, RhythmState]):
+    """
+    Client for the RhythmEnv Environment.
+    Example:
+        >>> async with RhythmEnv(base_url="http://localhost:8000") as client:
+        ...     result = await client.reset(task="easy")
+        ...     result = await client.step(RhythmAction(action_type=ActionType.START_TASK, task_id=0))
+    """
+    def _step_payload(self, action: RhythmAction) -> Dict[str, Any]:
+        """Serialize RhythmAction to JSON payload."""
+        payload: Dict[str, Any] = {"action_type": action.action_type.value}
+        if action.task_id is not None:
+            payload["task_id"] = action.task_id
+        return payload
+    def _parse_result(self, payload: Dict[str, Any]) -> StepResult[RhythmObservation]:
+        """Parse server response into StepResult[RhythmObservation]."""
+        obs_data = payload.get("observation", {})
+        observation = RhythmObservation(
+            timestep=obs_data.get("timestep", 0),
+            energy=obs_data.get("energy", 1.0),
+            stress=obs_data.get("stress", 0.0),
+            current_task_id=obs_data.get("current_task_id"),
+            tasks=[TaskInfo(**t) for t in obs_data.get("tasks", [])],
+            meetings=obs_data.get("meetings", []),
+            remaining_steps=obs_data.get("remaining_steps", 20),
+            reward_breakdown=obs_data.get("reward_breakdown", {}),
+            done=payload.get("done", False),
+            reward=payload.get("reward", 0.0),
+            metadata=obs_data.get("metadata", {}),
+        )
+        return StepResult(
+            observation=observation,
+            reward=payload.get("reward", 0.0),
+            done=payload.get("done", False),
+        )
+    def _parse_state(self, payload: Dict[str, Any]) -> RhythmState:
+        """Parse server response into RhythmState."""
+        return RhythmState(
+            episode_id=payload.get("episode_id", ""),
+            task_name=payload.get("task_name", ""),
+            timestep=payload.get("timestep", 0),
+            energy=payload.get("energy", 1.0),
+            stress=payload.get("stress", 0.0),
+            current_task_id=payload.get("current_task_id"),
+            step_count=payload.get("step_count", 0),
+        )

inference.py ADDED Viewed

	@@ -0,0 +1,298 @@

+"""
+RhythmEnv Inference Script
+===================================
+MANDATORY
+- Before submitting, ensure the following variables are defined in your environment configuration:
+    API_BASE_URL   The API endpoint for the LLM.
+    MODEL_NAME     The model identifier to use for inference.
+    HF_TOKEN       Your Hugging Face / API key.
+    LOCAL_IMAGE_NAME The name of the local image to use for the environment if you are using from_docker_image()
+- Defaults are set only for API_BASE_URL and MODEL_NAME
+    (and should reflect your active inference setup):
+    API_BASE_URL = os.getenv("API_BASE_URL", "<your-active-endpoint>")
+    MODEL_NAME = os.getenv("MODEL_NAME", "<your-active-model>")
+- The inference script must be named `inference.py` and placed in the root directory of the project
+- Participants must use OpenAI Client for all LLM calls using above variables
+STDOUT FORMAT
+- The script must emit exactly three line types to stdout, in this order:
+    [START] task=<task_name> env=<benchmark> model=<model_name>
+    [STEP]  step=<n> action=<action_str> reward=<0.00> done=<true|false> error=<msg|null>
+    [END]   success=<true|false> steps=<n> score=<score> rewards=<r1,r2,...,rn>
+  Rules:
+    - One [START] line at episode begin.
+    - One [STEP] line per step, immediately after env.step() returns.
+    - One [END] line after env.close(), always emitted (even on exception).
+    - reward and rewards are formatted to 2 decimal places.
+    - done and success are lowercase booleans: true or false.
+    - error is the raw last_action_error string, or null if none.
+    - All fields on a single line with no newlines within a line.
+    - Each tasks should return score in [0, 1]
+"""
+import asyncio
+import os
+import sys
+import textwrap
+from typing import List, Optional
+from openai import OpenAI
+# Add current directory to path for local imports
+sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
+from client import RhythmEnv
+from models import ActionType, RhythmAction
+# ---------------------------------------------------------------------------
+# Configuration
+# ---------------------------------------------------------------------------
+IMAGE_NAME = os.getenv("IMAGE_NAME")
+API_KEY = os.getenv("HF_TOKEN") or os.getenv("API_KEY")
+API_BASE_URL = os.getenv("API_BASE_URL", "https://router.huggingface.co/v1")
+MODEL_NAME = os.getenv("MODEL_NAME", "Qwen/Qwen2.5-72B-Instruct")
+BASE_URL = os.getenv("RHYTHM_ENV_URL", "http://localhost:8000")
+BENCHMARK = "rhythm_env"
+TASKS = ["easy", "medium", "hard"]
+MAX_STEPS = 20
+SCORE_THRESHOLD = 0.1
+SYSTEM_PROMPT = textwrap.dedent("""\
+You are a daily planning agent. You manage tasks across a workday.
+Each step is a 30-minute slot. You have energy (0-1) and stress (0-1).
+Available actions (respond with EXACTLY one line in this format):
+  START_TASK <task_id>
+  CONTINUE_TASK
+  SWITCH_TASK <task_id>
+  TAKE_BREAK
+Rules:
+- START_TASK/SWITCH_TASK require a task_id (integer).
+- CONTINUE_TASK continues your current task.
+- TAKE_BREAK recovers energy and reduces stress.
+- Take breaks when energy < 0.3.
+- Prioritize tasks by deadline urgency, then importance.
+- Avoid unnecessary switching (costs energy and reward).
+Respond with ONLY the action line, nothing else.""")
+# ---------------------------------------------------------------------------
+# Logging helpers
+# ---------------------------------------------------------------------------
+def log_start(task: str, env: str, model: str) -> None:
+    print(f"[START] task={task} env={env} model={model}", flush=True)
+def log_step(step: int, action: str, reward: float, done: bool, error: Optional[str]) -> None:
+    error_val = error if error else "null"
+    done_val = str(done).lower()
+    print(
+        f"[STEP] step={step} action={action} reward={reward:.2f} done={done_val} error={error_val}",
+        flush=True,
+    )
+def log_end(success: bool, steps: int, score: float, rewards: List[float]) -> None:
+    rewards_str = ",".join(f"{r:.2f}" for r in rewards)
+    print(
+        f"[END] success={str(success).lower()} steps={steps} score={score:.3f} rewards={rewards_str}",
+        flush=True,
+    )
+# ---------------------------------------------------------------------------
+# Heuristic action selection (enhanced by LLM)
+# ---------------------------------------------------------------------------
+def choose_action_heuristic(obs) -> RhythmAction:
+    """Greedy heuristic: prioritize by deadline then importance."""
+    energy = obs.energy
+    current_task_id = obs.current_task_id
+    tasks = obs.tasks
+    timestep = obs.timestep
+    meetings = obs.meetings
+    # During meeting slots, just take a break
+    if timestep in meetings:
+        return RhythmAction(action_type=ActionType.TAKE_BREAK)
+    # Take break if energy is low
+    if energy < 0.3:
+        return RhythmAction(action_type=ActionType.TAKE_BREAK)
+    # Get uncompleted tasks
+    uncompleted = [t for t in tasks if t.progress < t.effort]
+    if not uncompleted:
+        return RhythmAction(action_type=ActionType.TAKE_BREAK)
+    # Sort by deadline (ascending), then importance (descending)
+    uncompleted.sort(key=lambda t: (t.deadline, -t.importance))
+    # Check for urgent tasks (deadline within 3 steps)
+    urgent = [t for t in uncompleted if t.deadline - timestep <= 3]
+    best = urgent[0] if urgent else uncompleted[0]
+    if current_task_id is not None and current_task_id == best.id:
+        return RhythmAction(action_type=ActionType.CONTINUE_TASK)
+    elif current_task_id is not None:
+        return RhythmAction(action_type=ActionType.SWITCH_TASK, task_id=best.id)
+    else:
+        return RhythmAction(action_type=ActionType.START_TASK, task_id=best.id)
+def choose_action_llm(obs, llm_client: OpenAI) -> RhythmAction:
+    """Use LLM to pick an action, fall back to heuristic on failure."""
+    tasks_desc = "\n".join(
+        f"  Task {t.id}: {t.name} — {t.description}\n"
+        f"    (effort={t.effort:.2f}, progress={t.progress:.2f}, "
+        f"deadline=step {t.deadline}, importance={t.importance})"
+        for t in obs.tasks
+    )
+    user_prompt = textwrap.dedent(f"""\
+Step: {obs.timestep}/{MAX_STEPS}
+Energy: {obs.energy:.2f}
+Stress: {obs.stress:.2f}
+Current task: {obs.current_task_id}
+Meetings at steps: {obs.meetings}
+Remaining steps: {obs.remaining_steps}
+Tasks:
+{tasks_desc}
+Choose your action:""")
+    try:
+        completion = llm_client.chat.completions.create(
+            model=MODEL_NAME,
+            messages=[
+                {"role": "system", "content": SYSTEM_PROMPT},
+                {"role": "user", "content": user_prompt},
+            ],
+            temperature=0.3,
+            max_tokens=30,
+            stream=False,
+        )
+        text = (completion.choices[0].message.content or "").strip()
+        return parse_llm_action(text, obs)
+    except Exception:
+        return choose_action_heuristic(obs)
+def parse_llm_action(text: str, obs) -> RhythmAction:
+    """Parse LLM response text into a RhythmAction."""
+    text = text.strip().upper()
+    if text.startswith("TAKE_BREAK"):
+        return RhythmAction(action_type=ActionType.TAKE_BREAK)
+    if text.startswith("CONTINUE_TASK"):
+        if obs.current_task_id is not None:
+            return RhythmAction(action_type=ActionType.CONTINUE_TASK)
+        return choose_action_heuristic(obs)
+    for prefix, action_type in [
+        ("START_TASK", ActionType.START_TASK),
+        ("SWITCH_TASK", ActionType.SWITCH_TASK),
+    ]:
+        if text.startswith(prefix):
+            rest = text[len(prefix):].strip()
+            try:
+                task_id = int(rest)
+                if 0 <= task_id < len(obs.tasks):
+                    return RhythmAction(action_type=action_type, task_id=task_id)
+            except ValueError:
+                pass
+    # Fallback
+    return choose_action_heuristic(obs)
+# ---------------------------------------------------------------------------
+# Main loop
+# ---------------------------------------------------------------------------
+async def run_task(task_name: str, llm_client: OpenAI) -> float:
+    """Run a single task and return the score."""
+    if IMAGE_NAME:
+        env = await RhythmEnv.from_docker_image(IMAGE_NAME)
+    else:
+        env = RhythmEnv(base_url=BASE_URL)
+    rewards: List[float] = []
+    steps_taken = 0
+    score = 0.0
+    success = False
+    log_start(task=task_name, env=BENCHMARK, model=MODEL_NAME)
+    try:
+        async with env:
+            result = await env.reset(task=task_name)
+            for step in range(1, MAX_STEPS + 1):
+                if result.done:
+                    break
+                # Use LLM if available, otherwise heuristic
+                if llm_client is not None:
+                    action = choose_action_llm(result.observation, llm_client)
+                else:
+                    action = choose_action_heuristic(result.observation)
+                action_str = action.action_type.value
+                if action.task_id is not None:
+                    action_str += f"({action.task_id})"
+                result = await env.step(action)
+                reward = result.reward or 0.0
+                done = result.done
+                rewards.append(reward)
+                steps_taken = step
+                log_step(step=step, action=action_str, reward=reward, done=done, error=None)
+                if done:
+                    break
+            # Get final score from grader
+            score = result.observation.reward_breakdown.get("final_score", 0.0)
+            score = max(0.0, min(1.0, score))
+            success = score >= SCORE_THRESHOLD
+    except Exception as e:
+        print(f"[DEBUG] Error running task {task_name}: {e}", flush=True)
+    finally:
+        try:
+            await env.close()
+        except Exception as e:
+            print(f"[DEBUG] env.close() error: {e}", flush=True)
+        log_end(success=success, steps=steps_taken, score=score, rewards=rewards)
+    return score
+async def main() -> None:
+    llm_client = None
+    if API_KEY:
+        llm_client = OpenAI(base_url=API_BASE_URL, api_key=API_KEY)
+    scores = []
+    for task_name in TASKS:
+        s = await run_task(task_name, llm_client)
+        scores.append(s)
+    avg = sum(scores) / len(scores) if scores else 0.0
+    print(f"\n[SUMMARY] avg_score={avg:.3f} scores={','.join(f'{s:.3f}' for s in scores)}", flush=True)
+if __name__ == "__main__":
+    asyncio.run(main())

models.py ADDED Viewed

	@@ -0,0 +1,103 @@

+"""
+Data models for RhythmEnv Environment.
+Defines the Action, Observation, and State types for the daily planning
+and scheduling RL environment.
+"""
+from __future__ import annotations
+from enum import Enum
+from typing import Dict, List, Optional
+from openenv.core.env_server import Action, Observation, State
+from pydantic import BaseModel, Field
+class ActionType(str, Enum):
+    """Available action types for the agent."""
+    START_TASK = "start_task"
+    CONTINUE_TASK = "continue_task"
+    SWITCH_TASK = "switch_task"
+    TAKE_BREAK = "take_break"
+class RhythmAction(Action):
+    """
+    Action for RhythmEnv.
+    Attributes:
+        action_type: The type of action to perform.
+        task_id: Task index (required for START_TASK and SWITCH_TASK).
+    """
+    action_type: ActionType
+    task_id: Optional[int] = None
+class TaskInfo(BaseModel):
+    """
+    Information about a single task visible to the agent.
+    Attributes:
+        id: Unique task identifier.
+        name: Human-readable task name.
+        description: Brief description of what the task involves.
+        effort: Total work required (0-1 scale).
+        progress: Work completed so far (0 to effort).
+        deadline: Timestep by which task should be done.
+        importance: How important this task is (0-1).
+    """
+    id: int
+    name: str
+    description: str = ""
+    effort: float
+    progress: float
+    deadline: int
+    importance: float
+class RhythmObservation(Observation):
+    """
+    Observation for RhythmEnv.
+    Attributes:
+        timestep: Current 30-minute slot (0-19).
+        energy: Agent energy level (0-1).
+        stress: Agent stress level (0-1).
+        current_task_id: ID of task currently being worked on, or None.
+        tasks: List of all tasks with current progress.
+        meetings: Timesteps blocked by meetings.
+        remaining_steps: Steps left in the episode.
+        reward_breakdown: Component-wise reward details.
+    """
+    timestep: int = 0
+    energy: float = 1.0
+    stress: float = 0.0
+    current_task_id: Optional[int] = None
+    tasks: List[TaskInfo] = Field(default_factory=list)
+    meetings: List[int] = Field(default_factory=list)
+    remaining_steps: int = 20
+    reward_breakdown: Dict[str, float] = Field(default_factory=dict)
+class RhythmState(State):
+    """
+    State for RhythmEnv.
+    Attributes:
+        task_name: Name of the current scenario (easy/medium/hard).
+        timestep: Current 30-minute slot.
+        energy: Agent energy level.
+        stress: Agent stress level.
+        current_task_id: ID of task currently being worked on.
+    """
+    task_name: str = ""
+    timestep: int = 0
+    energy: float = 1.0
+    stress: float = 0.0
+    current_task_id: Optional[int] = None

openenv.yaml ADDED Viewed

	@@ -0,0 +1,6 @@

+spec_version: 1
+name: rhythm_env
+type: space
+runtime: fastapi
+app: server.app:app
+port: 8000

pyproject.toml ADDED Viewed

	@@ -0,0 +1,36 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+[build-system]
+requires = ["setuptools>=45", "wheel"]
+build-backend = "setuptools.build_meta"
+[project]
+name = "openenv-rhythm-env"
+version = "0.1.0"
+description = "RhythmEnv - Daily Planning RL Environment for OpenEnv"
+requires-python = ">=3.10"
+dependencies = [
+    "openenv-core[core]>=0.2.2",
+    "fastapi>=0.115.0",
+    "pydantic>=2.0.0",
+    "uvicorn>=0.24.0",
+    "requests>=2.31.0",
+]
+[project.optional-dependencies]
+dev = [
+    "pytest>=8.0.0",
+    "pytest-cov>=4.0.0",
+]
+[project.scripts]
+server = "rhythm_env.server.app:main"
+[tool.setuptools]
+include-package-data = true
+packages = ["rhythm_env", "rhythm_env.server"]
+package-dir = { "rhythm_env" = ".", "rhythm_env.server" = "server" }

server/Dockerfile ADDED Viewed

	@@ -0,0 +1,47 @@

+ARG BASE_IMAGE=ghcr.io/meta-pytorch/openenv-base:latest
+FROM ${BASE_IMAGE} AS builder
+WORKDIR /app
+COPY . /app/env
+WORKDIR /app/env
+RUN if ! command -v uv >/dev/null 2>&1; then \
+        curl -LsSf https://astral.sh/uv/install.sh | sh && \
+        mv /root/.local/bin/uv /usr/local/bin/uv && \
+        mv /root/.local/bin/uvx /usr/local/bin/uvx; \
+    fi
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    git \
+    && rm -rf /var/lib/apt/lists/*
+RUN --mount=type=cache,target=/root/.cache/uv \
+    if [ -f uv.lock ]; then \
+        uv sync --frozen --no-install-project --no-editable; \
+    else \
+        uv sync --no-install-project --no-editable; \
+    fi
+RUN --mount=type=cache,target=/root/.cache/uv \
+    if [ -f uv.lock ]; then \
+        uv sync --frozen --no-editable; \
+    else \
+        uv sync --no-editable; \
+    fi
+FROM ${BASE_IMAGE}
+WORKDIR /app
+COPY --from=builder /app/env/.venv /app/.venv
+COPY --from=builder /app/env /app/env
+ENV PATH="/app/.venv/bin:$PATH"
+ENV PYTHONPATH="/app/env:$PYTHONPATH"
+HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
+    CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:8000/health')" || exit 1
+CMD ["sh", "-c", "cd /app/env && uvicorn server.app:app --host 0.0.0.0 --port 8000"]

server/__init__.py ADDED Viewed

	@@ -0,0 +1,5 @@

+"""RhythmEnv environment server components."""
+from .rhythm_environment import RhythmEnvironment
+__all__ = ["RhythmEnvironment"]

server/app.py ADDED Viewed

	@@ -0,0 +1,68 @@

+"""
+FastAPI application for the RhythmEnv Environment.
+This module creates an HTTP server that exposes the RhythmEnvironment
+over HTTP and WebSocket endpoints, compatible with EnvClient.
+Endpoints:
+    - POST /reset: Reset the environment
+    - POST /step: Execute an action
+    - GET /state: Get current environment state
+    - GET /schema: Get action/observation schemas
+    - WS /ws: WebSocket endpoint for persistent sessions
+Usage:
+    # Development (with auto-reload):
+    uvicorn server.app:app --reload --host 0.0.0.0 --port 8000
+    # Production:
+    uvicorn server.app:app --host 0.0.0.0 --port 8000 --workers 4
+    # Or run directly:
+    python -m server.app
+"""
+try:
+    from openenv.core.env_server.http_server import create_app
+except Exception as e:  # pragma: no cover
+    raise ImportError(
+        "openenv is required for the web interface. Install dependencies with '\n    uv sync\n'"
+    ) from e
+try:
+    from ..models import RhythmAction, RhythmObservation
+    from .rhythm_environment import RhythmEnvironment
+except (ImportError, ModuleNotFoundError):
+    from models import RhythmAction, RhythmObservation
+    from server.rhythm_environment import RhythmEnvironment
+# Create the app with web interface and README integration
+app = create_app(
+    RhythmEnvironment,
+    RhythmAction,
+    RhythmObservation,
+    env_name="rhythm_env",
+)
+def main(host: str = "0.0.0.0", port: int = 8000):
+    """
+    Entry point for direct execution via uv run or python -m.
+    This function enables running the server without Docker:
+        uv run --project . server
+        uv run --project . server --port 8001
+        python -m rhythm_env.server.app
+    Args:
+        host: Host address to bind to (default: "0.0.0.0")
+        port: Port number to listen on (default: 8000)
+    """
+    import uvicorn
+    uvicorn.run(app, host=host, port=port)
+if __name__ == "__main__":
+    main()

server/requirements.txt ADDED Viewed

	@@ -0,0 +1,4 @@

+openenv-core[core]>=0.2.2
+fastapi>=0.115.0
+uvicorn>=0.24.0
+pydantic>=2.0.0

server/rhythm_environment.py ADDED Viewed

	@@ -0,0 +1,593 @@

+"""
+RhythmEnv Environment Implementation.
+A deterministic RL environment simulating daily planning and scheduling
+under energy, stress, deadline, and importance constraints.
+1 episode = 1 day, 1 step = 30 minutes, 20 steps total.
+"""
+from typing import Any, Dict, List, Optional, Set
+from uuid import uuid4
+from openenv.core.env_server import Environment
+from openenv.core.env_server.types import EnvironmentMetadata
+# Support both in-repo and standalone imports
+try:
+    from ..models import (
+        ActionType,
+        RhythmAction,
+        RhythmObservation,
+        RhythmState,
+        TaskInfo,
+    )
+except ImportError as e:
+    if "relative import" not in str(e) and "no known parent package" not in str(e):
+        raise
+    from models import (
+        ActionType,
+        RhythmAction,
+        RhythmObservation,
+        RhythmState,
+        TaskInfo,
+    )
+# ---------------------------------------------------------------------------
+# Task scenario configurations (all deterministic)
+# ---------------------------------------------------------------------------
+TASK_CONFIGS: Dict[str, Dict[str, Any]] = {
+    "easy": {
+        "scenario": "You are a marketing analyst preparing for a quarterly review. "
+        "Your manager needs the Q3 performance report by midday. "
+        "You also have routine emails and expense filing to handle.",
+        "tasks": [
+            {
+                "id": 0,
+                "name": "Q3 Performance Report",
+                "description": "Compile sales data, create visualizations, and write executive summary for the quarterly business review.",
+                "effort": 0.65,
+                "progress": 0.0,
+                "deadline": 10,
+                "importance": 0.9,
+            },
+            {
+                "id": 1,
+                "name": "Client Emails",
+                "description": "Respond to 12 pending client inquiries about pricing updates and contract renewals.",
+                "effort": 0.45,
+                "progress": 0.0,
+                "deadline": 13,
+                "importance": 0.3,
+            },
+            {
+                "id": 2,
+                "name": "Expense Filing",
+                "description": "Submit last month's travel receipts and categorize team expenses in the accounting system.",
+                "effort": 0.35,
+                "progress": 0.0,
+                "deadline": 16,
+                "importance": 0.2,
+            },
+        ],
+        "meetings": [3, 11],
+        "initial_energy": 0.75,
+    },
+    "medium": {
+        "scenario": "You are a product manager with a client pitch tomorrow. "
+        "The proposal and presentation deck are top priority, but you also need to "
+        "review a teammate's design doc and prepare meeting notes for leadership.",
+        "tasks": [
+            {
+                "id": 0,
+                "name": "Client Proposal",
+                "description": "Draft a 5-page proposal for the enterprise client including pricing tiers, timeline, and integration plan.",
+                "effort": 0.40,
+                "progress": 0.0,
+                "deadline": 8,
+                "importance": 0.7,
+            },
+            {
+                "id": 1,
+                "name": "Pitch Deck",
+                "description": "Create a 15-slide presentation with product demos, ROI projections, and competitive analysis.",
+                "effort": 0.35,
+                "progress": 0.0,
+                "deadline": 10,
+                "importance": 0.8,
+            },
+            {
+                "id": 2,
+                "name": "Design Review",
+                "description": "Review the UX team's redesign mockups for the dashboard. Provide written feedback on usability and alignment with product goals.",
+                "effort": 0.25,
+                "progress": 0.0,
+                "deadline": 14,
+                "importance": 0.5,
+            },
+            {
+                "id": 3,
+                "name": "Leadership Notes",
+                "description": "Summarize this week's sprint outcomes and blockers for the Monday leadership sync.",
+                "effort": 0.20,
+                "progress": 0.0,
+                "deadline": 18,
+                "importance": 0.4,
+            },
+        ],
+        "meetings": [4, 12],
+        "initial_energy": 0.7,
+    },
+    "hard": {
+        "scenario": "You are a senior engineer on a critical release day. "
+        "The system architecture redesign is the highest priority, but two production "
+        "bugs are blocking users, docs need updating, and test coverage is behind.",
+        "tasks": [
+            {
+                "id": 0,
+                "name": "Architecture Redesign",
+                "description": "Refactor the authentication service from monolith to microservice pattern. Requires deep focus: redesign API contracts, update database schema, and write migration scripts.",
+                "effort": 0.80,
+                "progress": 0.0,
+                "deadline": 16,
+                "importance": 0.9,
+            },
+            {
+                "id": 1,
+                "name": "Fix: Login Timeout",
+                "description": "Users on slow connections get a 504 timeout during OAuth handshake. Root cause is likely the retry logic in the auth middleware.",
+                "effort": 0.15,
+                "progress": 0.0,
+                "deadline": 6,
+                "importance": 0.5,
+            },
+            {
+                "id": 2,
+                "name": "Fix: CSV Export",
+                "description": "The data export endpoint crashes on records with Unicode characters in the notes field. Need to fix encoding in the serializer.",
+                "effort": 0.15,
+                "progress": 0.0,
+                "deadline": 10,
+                "importance": 0.4,
+            },
+            {
+                "id": 3,
+                "name": "API Documentation",
+                "description": "Update the REST API docs to reflect the new v3 endpoints. Add request/response examples and deprecation notices for v2.",
+                "effort": 0.20,
+                "progress": 0.0,
+                "deadline": 14,
+                "importance": 0.3,
+            },
+            {
+                "id": 4,
+                "name": "Integration Tests",
+                "description": "Write end-to-end tests for the payment flow covering Stripe webhook handling, refund processing, and receipt generation.",
+                "effort": 0.20,
+                "progress": 0.0,
+                "deadline": 18,
+                "importance": 0.6,
+            },
+        ],
+        "meetings": [6],
+        "initial_energy": 0.4,
+    },
+}
+# ---------------------------------------------------------------------------
+# Constants
+# ---------------------------------------------------------------------------
+MAX_STEPS = 20
+PROGRESS_RATE = 0.15
+ENERGY_WORK_DRAIN = 0.05
+ENERGY_BREAK_GAIN = 0.12
+ENERGY_MEETING_DRAIN = 0.03
+ENERGY_SWITCH_DRAIN = 0.02
+STRESS_DEADLINE_MISS = 0.15
+STRESS_APPROACHING = 0.03
+STRESS_BREAK_RELIEF = 0.08
+STRESS_COMPLETION_RELIEF = 0.1
+APPROACHING_DEADLINE_WINDOW = 2
+MAX_FREE_BREAKS = 2
+BREAK_SPAM_PENALTY = 0.05
+SWITCH_PENALTY = 0.1
+IDLE_PENALTY = 0.05
+DEADLINE_MISS_PENALTY = 0.3
+STRESS_PENALTY_RATE = 0.1
+PROGRESS_REWARD_SCALE = 2.0
+COMPLETION_BONUS_SCALE = 1.5
+DEEP_WORK_BONUS = 0.05
+EXECUTION_BONUS = 0.02
+class RhythmEnvironment(Environment):
+    """
+    Daily planning and scheduling environment.
+    The agent manages a set of tasks over a simulated workday, balancing
+    energy, stress, deadlines, and task importance.
+    """
+    SUPPORTS_CONCURRENT_SESSIONS: bool = True
+    def __init__(self) -> None:
+        super().__init__()
+        self._state = RhythmState()
+        # Internal tracking
+        self._tasks: List[Dict[str, Any]] = []
+        self._meetings: List[int] = []
+        self._initial_energy: float = 1.0
+        self._energy: float = 1.0
+        self._stress: float = 0.0
+        self._current_task_id: Optional[int] = None
+        self._consecutive_breaks: int = 0
+        self._completed_tasks: Set[int] = set()
+        self._missed_deadlines: Set[int] = set()
+        self._total_energy: float = 0.0
+        self._total_stress: float = 0.0
+        self._steps_working: int = 0
+        self._switch_count: int = 0
+        self._timestep: int = 0
+    def get_metadata(self) -> EnvironmentMetadata:
+        return EnvironmentMetadata(
+            name="RhythmEnv",
+            description=(
+                "A deterministic RL environment for daily planning and scheduling "
+                "under energy, stress, deadline, and importance constraints."
+            ),
+            version="0.1.0",
+        )
+    # ------------------------------------------------------------------
+    # reset
+    # ------------------------------------------------------------------
+    def reset(
+        self,
+        seed: Optional[int] = None,
+        episode_id: Optional[str] = None,
+        **kwargs: Any,
+    ) -> RhythmObservation:
+        task_name = kwargs.get("task", "easy")
+        if task_name not in TASK_CONFIGS:
+            task_name = "easy"
+        config = TASK_CONFIGS[task_name]
+        # Deep-copy tasks so mutations don't affect the template
+        self._tasks = [dict(t) for t in config["tasks"]]
+        self._meetings = list(config["meetings"])
+        self._initial_energy = config["initial_energy"]
+        # Reset state
+        self._energy = self._initial_energy
+        self._stress = 0.0
+        self._current_task_id = None
+        self._consecutive_breaks = 0
+        self._completed_tasks = set()
+        self._missed_deadlines = set()
+        self._total_energy = 0.0
+        self._total_stress = 0.0
+        self._steps_working = 0
+        self._switch_count = 0
+        self._timestep = 0
+        self._state = RhythmState(
+            episode_id=episode_id or str(uuid4()),
+            step_count=0,
+            task_name=task_name,
+            timestep=0,
+            energy=self._energy,
+            stress=self._stress,
+            current_task_id=None,
+        )
+        return self._make_observation(reward=0.0, done=False, reward_breakdown={})
+    # ------------------------------------------------------------------
+    # step
+    # ------------------------------------------------------------------
+    def step(
+        self,
+        action: RhythmAction,
+        timeout_s: Optional[float] = None,
+        **kwargs: Any,
+    ) -> RhythmObservation:
+        reward_breakdown: Dict[str, float] = {}
+        progress_delta = 0.0
+        completed_this_step: List[int] = []
+        switched = False
+        is_idle = False
+        is_meeting = self._timestep in self._meetings
+        # --- Meeting override ---
+        if is_meeting:
+            self._energy = max(0.0, self._energy - ENERGY_MEETING_DRAIN)
+            # During meetings, agent cannot work — action is ignored
+        else:
+            # --- Validate & process action ---
+            valid = self._validate_action(action)
+            if not valid:
+                is_idle = True
+            elif action.action_type == ActionType.TAKE_BREAK:
+                self._current_task_id = None
+                self._consecutive_breaks += 1
+                self._energy = min(1.0, self._energy + ENERGY_BREAK_GAIN)
+                self._stress = max(0.0, self._stress - STRESS_BREAK_RELIEF)
+            else:
+                # Reset break counter on any non-break action
+                self._consecutive_breaks = 0
+                if action.action_type == ActionType.START_TASK:
+                    if self._current_task_id is not None and self._current_task_id != action.task_id:
+                        switched = True
+                    self._current_task_id = action.task_id
+                elif action.action_type == ActionType.SWITCH_TASK:
+                    if self._current_task_id is not None and self._current_task_id != action.task_id:
+                        switched = True
+                    self._current_task_id = action.task_id
+                elif action.action_type == ActionType.CONTINUE_TASK:
+                    if self._current_task_id is None:
+                        is_idle = True
+                # Apply switch energy penalty
+                if switched:
+                    self._energy = max(0.0, self._energy - ENERGY_SWITCH_DRAIN)
+                    self._switch_count += 1
+                # Compute progress if working on a valid uncompleted task
+                if (
+                    self._current_task_id is not None
+                    and not is_idle
+                    and self._current_task_id not in self._completed_tasks
+                ):
+                    task = self._tasks[self._current_task_id]
+                    progress_delta = PROGRESS_RATE * self._energy
+                    task["progress"] = min(task["effort"], task["progress"] + progress_delta)
+                    # Check completion
+                    if task["progress"] >= task["effort"] and self._current_task_id not in self._completed_tasks:
+                        self._completed_tasks.add(self._current_task_id)
+                        completed_this_step.append(self._current_task_id)
+                    self._energy = max(0.0, self._energy - ENERGY_WORK_DRAIN)
+                    self._steps_working += 1
+                elif self._current_task_id is not None and self._current_task_id in self._completed_tasks:
+                    # Working on already-completed task = idle
+                    is_idle = True
+        # --- Check deadlines ---
+        new_missed: List[int] = []
+        for t in self._tasks:
+            tid = t["id"]
+            if tid not in self._completed_tasks and tid not in self._missed_deadlines:
+                if self._timestep > t["deadline"]:
+                    self._missed_deadlines.add(tid)
+                    new_missed.append(tid)
+                    self._stress = min(1.0, self._stress + STRESS_DEADLINE_MISS)
+        # --- Stress from approaching deadlines ---
+        for t in self._tasks:
+            tid = t["id"]
+            if tid not in self._completed_tasks and tid not in self._missed_deadlines:
+                if 0 < t["deadline"] - self._timestep <= APPROACHING_DEADLINE_WINDOW:
+                    self._stress = min(1.0, self._stress + STRESS_APPROACHING)
+        # --- Stress relief from completion ---
+        for _ in completed_this_step:
+            self._stress = max(0.0, self._stress - STRESS_COMPLETION_RELIEF)
+        # --- Advance timestep ---
+        self._timestep += 1
+        self._state.step_count += 1
+        # --- Track averages ---
+        self._total_energy += self._energy
+        self._total_stress += self._stress
+        # --- Compute reward ---
+        reward = 0.0
+        # Progress reward
+        if progress_delta > 0 and self._current_task_id is not None:
+            task = self._tasks[self._current_task_id]
+            r = progress_delta * task["importance"] * PROGRESS_REWARD_SCALE
+            reward += r
+            reward_breakdown["progress_reward"] = round(r, 4)
+        # Completion bonus
+        for tid in completed_this_step:
+            bonus = self._tasks[tid]["importance"] * COMPLETION_BONUS_SCALE
+            reward += bonus
+            reward_breakdown["completion_bonus"] = round(
+                reward_breakdown.get("completion_bonus", 0.0) + bonus, 4
+            )
+        # Stress penalty
+        stress_pen = -self._stress * STRESS_PENALTY_RATE
+        reward += stress_pen
+        reward_breakdown["stress_penalty"] = round(stress_pen, 4)
+        # Deadline miss penalty
+        if new_missed:
+            dp = -DEADLINE_MISS_PENALTY * len(new_missed)
+            reward += dp
+            reward_breakdown["deadline_penalty"] = round(dp, 4)
+        # Switch penalty
+        if switched:
+            reward -= SWITCH_PENALTY
+            reward_breakdown["switch_penalty"] = round(-SWITCH_PENALTY, 4)
+        # Idle penalty
+        if not is_meeting and is_idle:
+            reward -= IDLE_PENALTY
+            reward_breakdown["idle_penalty"] = round(-IDLE_PENALTY, 4)
+        # Break spam penalty
+        if not is_meeting and action.action_type == ActionType.TAKE_BREAK:
+            spam = -BREAK_SPAM_PENALTY * max(0, self._consecutive_breaks - MAX_FREE_BREAKS)
+            if spam < 0:
+                reward += spam
+                reward_breakdown["break_spam_penalty"] = round(spam, 4)
+        # Mode bonus
+        mode = self._compute_mode()
+        mode_bonus = 0.0
+        if mode == "deep_work":
+            mode_bonus = DEEP_WORK_BONUS
+        elif mode == "execution":
+            mode_bonus = EXECUTION_BONUS
+        if mode_bonus > 0:
+            reward += mode_bonus
+            reward_breakdown["mode_bonus"] = round(mode_bonus, 4)
+        # Clamp reward
+        reward = max(-1.0, min(1.0, round(reward, 4)))
+        # --- Done? ---
+        done = self._timestep >= MAX_STEPS
+        # --- Final grading ---
+        if done:
+            final_score = self._grade_episode()
+            reward_breakdown["final_score"] = round(final_score, 4)
+        # --- Update state ---
+        self._state.timestep = self._timestep
+        self._state.energy = round(self._energy, 4)
+        self._state.stress = round(self._stress, 4)
+        self._state.current_task_id = self._current_task_id
+        return self._make_observation(
+            reward=reward, done=done, reward_breakdown=reward_breakdown
+        )
+    # ------------------------------------------------------------------
+    # state property
+    # ------------------------------------------------------------------
+    @property
+    def state(self) -> RhythmState:
+        return self._state
+    # ------------------------------------------------------------------
+    # Helpers
+    # ------------------------------------------------------------------
+    def _validate_action(self, action: RhythmAction) -> bool:
+        """Return True if the action is valid given current state."""
+        if action.action_type in (ActionType.START_TASK, ActionType.SWITCH_TASK):
+            if action.task_id is None:
+                return False
+            if action.task_id < 0 or action.task_id >= len(self._tasks):
+                return False
+            if action.task_id in self._completed_tasks:
+                return False
+        if action.action_type == ActionType.CONTINUE_TASK:
+            if self._current_task_id is None:
+                return False
+            if self._current_task_id in self._completed_tasks:
+                return False
+        return True
+    def _compute_mode(self) -> str:
+        """Compute hidden internal mode (not exposed to agent)."""
+        if (
+            self._energy > 0.6
+            and self._stress < 0.3
+            and self._current_task_id is not None
+            and self._tasks[self._current_task_id]["effort"] > 0.5
+        ):
+            return "deep_work"
+        if (
+            self._energy > 0.3
+            and self._stress < 0.6
+            and self._current_task_id is not None
+        ):
+            return "execution"
+        return "balanced"
+    def _grade_episode(self) -> float:
+        """Compute final episode score in [0, 1]."""
+        # 1. Completion score (weighted by importance)
+        total_importance = sum(t["importance"] for t in self._tasks)
+        completed_importance = sum(
+            t["importance"]
+            for t in self._tasks
+            if t["id"] in self._completed_tasks
+        )
+        completion_score = (
+            completed_importance / total_importance if total_importance > 0 else 0.0
+        )
+        # 2. Deadline score
+        total_tasks = len(self._tasks)
+        deadlines_met = total_tasks - len(self._missed_deadlines)
+        deadline_score = deadlines_met / total_tasks if total_tasks > 0 else 0.0
+        # 3. Efficiency score
+        total_effort = sum(
+            t["effort"]
+            for t in self._tasks
+            if t["id"] in self._completed_tasks
+        )
+        optimal_steps = total_effort / PROGRESS_RATE if total_effort > 0 else 1.0
+        actual_steps = max(self._steps_working, 1)
+        efficiency_score = min(1.0, optimal_steps / actual_steps)
+        # 4. Energy management (average energy)
+        steps_elapsed = max(self._timestep, 1)
+        energy_management = self._total_energy / steps_elapsed
+        # 5. Stress management (1 - average stress)
+        stress_management = 1.0 - (self._total_stress / steps_elapsed)
+        score = (
+            0.45 * completion_score
+            + 0.20 * deadline_score
+            + 0.15 * efficiency_score
+            + 0.10 * energy_management
+            + 0.10 * stress_management
+        )
+        return max(0.0, min(1.0, score))
+    def _make_observation(
+        self,
+        reward: float,
+        done: bool,
+        reward_breakdown: Dict[str, float],
+    ) -> RhythmObservation:
+        """Build the observation returned to the agent."""
+        task_infos = [
+            TaskInfo(
+                id=t["id"],
+                name=t["name"],
+                description=t.get("description", ""),
+                effort=round(t["effort"], 4),
+                progress=round(t["progress"], 4),
+                deadline=t["deadline"],
+                importance=t["importance"],
+            )
+            for t in self._tasks
+        ]
+        return RhythmObservation(
+            timestep=self._timestep,
+            energy=round(self._energy, 4),
+            stress=round(self._stress, 4),
+            current_task_id=self._current_task_id,
+            tasks=task_infos,
+            meetings=self._meetings,
+            remaining_steps=MAX_STEPS - self._timestep,
+            reward_breakdown=reward_breakdown,
+            reward=reward,
+            done=done,
+        )

uv.lock ADDED Viewed

The diff for this file is too large to render. See raw diff