HuggingFaceFW/fineweb-edu
Viewer
•
Updated
•
3.5B
•
348k
•
919
mlfoundations/dclm-baseline-1.0
Preview
•
Updated
•
155k
•
251
Viewer
•
Updated
•
4.48B
•
109k
•
725
Note
only multimodal data =(
Viewer
•
Updated
•
48.3M
•
8.22k
•
348
Viewer
•
Updated
•
5.45B
•
5.43k
•
447
Note
Don't have directly text =(
HuggingFaceTB/issues-kaggle-notebooks
Viewer
•
Updated
•
16.1M
•
250
•
13
Note
only 500k rows
Viewer
•
Updated
•
7.89M
•
9.39k
•
182
Note
1.6M rows with web-0.5-to-1.0
Locutusque/UltraTextbooks
Viewer
•
Updated
•
5.52M
•
651
•
196
tokyotech-llm/swallow-math-v2
Viewer
•
Updated
•
17.4M
•
15.2k
•
18
tokyotech-llm/swallow-code-v2
Viewer
•
Updated
•
147M
•
14.2k
•
26
HuggingFaceFW/finepdfs-edu
Viewer
•
Updated
•
49.5M
•
8.55k
•
75
HuggingFaceTB/smollm-corpus
Viewer
•
Updated
•
237M
•
22.7k
•
417