High-quality synthetic educational datasets designed for LLM pretraining with structured pedagogical content across 9 knowledge domains.