🤏 Smol-Data Collection Tried and tested mixes for strong pretraining. Inspired by https://huggingface.co/blog/codelion/optimal-dataset-mixing • 14 items • Updated 2 days ago • 10
Sutra Pedagogical Datasets Collection High-quality synthetic educational datasets designed for LLM pretraining with structured pedagogical content across 9 knowledge domains. • 6 items • Updated 1 day ago • 1
Dhara Foundational Models Collection Diffusion Language Models combining deep narrow networks, Canon layers (depthwise causal convolutions), and WSD (Warmup-Stable-Decay) training. • 2 items • Updated 8 days ago • 3