NVIDIA GR00T N1 humanoid robotics foundation models for embodied AI, manipulation, and robot learning
Karsten Kuhnke PRO
mindchain
AI & ML interests
Industry Grade Humanoid Synthetic Motion Data Generation, Mechanistic Interpretability Data Generation, Sparse Autoencoders, Edge IOT, Gemma Scope 2, RLHF, Edge AI, Alpa SIM, Alpamayo-R1, Cosmos, Isaac SIM, Isaac LAB, GR00T N1.6, Unreal Engine
Recent Activity
upvoted
a
collection
about 5 hours ago
Google TranslateGemma - 55 Language Translation Models
upvoted
a
collection
about 5 hours ago
NVIDIA GR00T N1 - Humanoid Robotics Foundation Models
published
a Space
about 22 hours ago
haddockaihamburg/README
Organizations
NVIDIA Nemotron PII - Privacy & Data Protection Dataset
NVIDIA Nemotron PII dataset for personally identifiable information detection and privacy-aware NLP
Unitree G1 Dex1 - Humanoid Robot Dexterity Datasets
Unitree G1 humanoid robot Dex1 dexterity datasets with mounted camera for manipulation learning
Unitree G1 BrainCo - Grasping & Manipulation Data
Unitree G1 humanoid robot BrainCo datasets for grasping, object manipulation, and dexterous hand training
-
unitreerobotics/G1_Brainco_GraspOreo_Dataset
Viewer • Updated • 235k • 533 -
unitreerobotics/G1_Brainco_GraspRubiksCube_Dataset
Viewer • Updated • 221k • 541 • 1 -
unitreerobotics/G1_Brainco_PickApple_Dataset
Viewer • Updated • 154k • 140 -
unitreerobotics/G1_Brainco_PickCharger_Dataset
Viewer • Updated • 217k • 527
Hugging Face - LeRobot - Pi0 (Old Version)
Hugging Face LeRobot Pi0 legacy version - archived robotics model for reference and compatibility
Hugging Face - LeRobot - Open X-Embodiment
Open X-Embodiment robotics datasets: cross-platform robot learning for DROID, Kuka, TACO, JACO
LeRobot Pi0 - HuggingFace Robotics Foundation Model
Hugging Face LeRobot Pi0 foundation model for robotics: manipulation, navigation, and embodied AI
LeRobot XVLA - Cross-Embodiment Vision-Language-Action
Hugging Face LeRobot XVLA cross-embodiment vision-language-action models for universal robot control
Hyper Graph Reasoning - Knowledge Graphs for AI Agents
Higher-order knowledge representations and hypergraph reasoning for agentic AI and scientific discovery
NVIDIA Nemotron Orchestrator - Multi-Model Routing
NVIDIA Nemotron Orchestrator 8B for multi-model coordination, task routing, and agentic workflows
Meta RoBERTa - Pretrained NLP & Text Classification
Meta RoBERTa pretrained language models for NLP tasks: classification, NER, sentiment analysis
-
FacebookAI/roberta-base
Fill-Mask • 0.1B • Updated • 9.73M • • 551 -
FacebookAI/xlm-roberta-large-finetuned-conll03-german
Token Classification • Updated • 4.6k • • 14 -
FacebookAI/xlm-roberta-large-finetuned-conll02-spanish
Fill-Mask • 0.6B • Updated • 78 • 2 -
FacebookAI/xlm-roberta-large
Fill-Mask • 0.6B • Updated • 3.48M • • 487
NVIDIA Physical AI - Autonomous Vehicles & Robotics
NVIDIA Physical AI models for robotics, embodied intelligence, and real-world interaction
Qwen3 VL Reranker - Multimodal RAG Ranking Models
Qwen3 Vision-Language reranker models for RAG pipelines and multimodal document retrieval
Facebook/Meta - Research Plan Dataset
Meta Research Plan datasets for AI research planning, scientific reasoning, and agent workflows
NVIDIA Clara Medical - Healthcare & Clinical NLP
NVIDIA Clara medical AI models for healthcare, clinical NLP, and medical imaging analysis
NVIDIA Clara Molecular - Drug Discovery & Chemistry
NVIDIA Clara molecular models for drug discovery, molecular property prediction, and computational chemistry
Nvidia Nemotron RAG - Reranking
NVIDIA Nemotron reranking models for RAG pipelines, search result optimization, and document ranking
NVIDIA Alpamayo-R1 - Reasoning & Physical AI Models
NVIDIA Alpamayo-R1 reasoning model for complex problem solving, mathematical reasoning, and chain-of-thought
-
nvidia/Alpamayo-R1-10B
Robotics • 11B • Updated • 25.7k • 305 -
nvidia/PhysicalAI-Autonomous-Vehicles
Updated • 156k • 695 -
nvidia/PhysicalAI-Autonomous-Vehicles-NuRec
Updated • 11.8k • 110 -
Alpamayo-R1: Bridging Reasoning and Action Prediction for Generalizable Autonomous Driving in the Long Tail
Paper • 2511.00088 • Published • 3
NVIDIA Nemotron Speech - ASR & Text-to-Speech
NVIDIA Nemotron speech models for ASR, text-to-speech, and voice AI applications
-
nvidia/nemotron-speech-streaming-en-0.6b
Automatic Speech Recognition • Updated • 5.78k • 392 -
nvidia/parakeet-tdt-0.6b-v3
Automatic Speech Recognition • Updated • 72.1k • 554 -
nvidia/parakeet_realtime_eou_120m-v1
Updated • 575 • 106 -
nvidia/multitalker-parakeet-streaming-0.6b-v1
Automatic Speech Recognition • Updated • 615 • 63
OpenAI GPT-OSS - Steering Vectors & SAE Research
Open-source GPT models with steering vectors for controllable generation and behavior modification
NVIDIA Cosmos Reason 2 - World Model Reasoning
NVIDIA Cosmos 2 Reason models for world model reasoning, physics simulation, and causal understanding
NVIDIA Cosmos 2 - Cosmos-Predict 2.5
NVIDIA Cosmos 2.5 Predict models for world simulation, future frame prediction, and physical AI
Edge & Smartphone - On-Device Mobile AI Models
On-device AI models optimized for smartphone deployment: mobile LLMs, edge inference, and efficient architectures
NVIDIA Nemotron Safety - AI Alignment Datasets
NVIDIA Nemotron safety datasets for AI alignment, content moderation, and responsible AI training
-
nvidia/Nemotron-AIQ-Agentic-Safety-Dataset-1.0
Viewer • Updated • 10.8k • 428 • 10 -
nvidia/Nemotron-Content-Safety-Reasoning-Dataset
Preview • Updated • 77 • 5 -
nvidia/Aegis-AI-Content-Safety-Dataset-2.0
Viewer • Updated • 33.4k • 3k • 72 -
nvidia/Nemotron-Content-Safety-Audio-Dataset
Viewer • Updated • 1.93k • 1.34k • 3
NVIDIA Nemotron VLM - Vision-Language Training Data
NVIDIA Nemotron vision-language datasets for multimodal training, image understanding, and VLM finetuning
Deep Thinking - Extended Chain-of-Thought Reasoning
Deep thinking and reasoning models for extended chain-of-thought, deliberative alignment, and complex problem solving
Small Coders - Lightweight Code Generation Models
Lightweight code generation models for edge deployment, IDE integration, and fast code completion
YOLO - Real-Time Object Detection Models
YOLO object detection models for real-time computer vision, autonomous systems, and video analytics
Affordable Coding APIs - Cost-Effective LLM Endpoints
Cost-effective coding API providers and affordable LLM endpoints for development and prototyping
RLM - Neuro-Symbolic Architecture - Reasonig Traces
Inference Wrapper - Models: Root LLM (The Architect) + Python REPL (The Engine) + Sub LLMs (The Workers) Spaned by querys
NVIDIA Nemotron Post-Training - RLHF & SFT Data
NVIDIA Nemotron post-training datasets for RLHF, instruction tuning, and alignment fine-tuning
NVIDIA Nemotron Pre-Training - Foundation Model Data
NVIDIA Nemotron pre-training datasets for large language model training and foundation model development
Topological Transformer - Deepseek
: Manifold-Constrained Hyper-Connections
Deep Research - Autonomous AI Literature Review
Deep research AI agents and models for autonomous literature review, scientific reasoning, and knowledge synthesis
PP-StructureV3 - Document Analysis & Table OCR
PaddlePaddle PP-StructureV3 for document analysis, table recognition, and intelligent document processing
Circuit Sparsity - Neural Network Interpretability
Circuit sparsity research for neural network pruning, mechanistic interpretability, and efficient model compression
Text to Motion - Human Animation & Gesture AI
Text-to-motion generation models for human animation, gesture synthesis, and motion capture AI
TTS - Text-to-Speech & Voice Synthesis Models
Text-to-Speech models for voice synthesis, neural TTS, and natural language audio generation
Audio Segmenting - Meta SAM 3 Audio
Audio segmentation models based on Meta SAM architecture for sound separation and audio understanding
NVIDIA Nemotron V3 - Post-Training Datasets
Mamba/Transformers Combo Hybride
Open Source AI - Fully Open Weights & Training Data
Fully open-source AI models with permissive licenses for commercial use and research
Image to 3D - Single-Image 3D Reconstruction
Image-to-3D generation models for single-image 3D reconstruction, mesh generation, and 3D asset creation
Deep Research Agents - Specialized Search & Reasoning
Specialized deep research models for domain-specific scientific reasoning and literature analysis
IBM Granite - Enterprise AI & Code Generation
IBM Granite foundation models for enterprise AI, code generation, and multilingual NLP tasks
Small OCR - Lightweight Text Recognition for Edge
Lightweight OCR models for edge deployment, mobile text recognition, and efficient document processing
Hierarchical RL - Multi-Level Decision Making
Hierarchical reinforcement learning models for multi-level decision making and complex task decomposition
Bread & Butter - Top Production-Ready LLMs 2025
Top LLMs 2025: ZAI GLM-4.7 (358B) & Moonshot Kimi-K2-Thinking. Next-gen reasoning, code, multilingual. State-of-the-art performance. Production-ready.
Haddock Custom Sparse Autodecoders
Custom JumpReLU Sparse Autoencoders for mechanistic interpretability. T5Gemma-2 SAEs across all layers. AI safety & interpretability research.
Nvidia Nemo-Gym
NVIDIA Nemotron RL datasets for AI agent training. Web search, workplace tasks, instruction following, structured outputs. RLHF & alignment research.
-
nvidia/Nemotron-RL-knowledge-web_search-mcqa
Viewer • Updated • 2.93k • 263 • 7 -
nvidia/Nemotron-RL-agent-workplace_assistant
Viewer • Updated • 1.8k • 223 • 12 -
nvidia/Nemotron-RL-instruction_following
Preview • Updated • 168 • 9 -
nvidia/Nemotron-RL-instruction_following-structured_outputs
Viewer • Updated • 9.95k • 205 • 25
Trained
Custom-trained models by mindchain: reward models & SAEs. Haddock Reward Mini (8B), function-specific SAEs. AI interpretability & alignment research.
Auto Decoders
JumpReLU SAEs for Gemma 3 interpretability. EleutherAI models, DeepSeek-R1, Pythia SAEs. Mechanistic interpretability & AI safety research.
Google FunctionGemma (Gemma 3)
Function calling: Google FunctionGemma-270m-IT & Mobile Actions dataset (9.65k). Efficient tool use in small LMs. AI agent development.
Google TranslateGemma - 55 Language Translation Models
Google TranslateGemma multilingual translation models supporting 55 languages for neural machine translation and cross-lingual NLP
Unitree Z1 Arm - Dual Dexterity Manipulation Data
Unitree Z1 robotic arm datasets for manipulation learning, grasp planning, and arm control training
-
unitreerobotics/Z1_Dual_Dex1_CleanupPencils_Dataset
Viewer • Updated • 133k • 308 • 2 -
unitreerobotics/Z1_Dual_Dex1_FoldClothes_Dataset
Viewer • Updated • 293k • 237 • 2 -
unitreerobotics/Z1_Dual_Dex1_PourCoffee_Dataset
Viewer • Updated • 443k • 531 • 1 -
unitreerobotics/Z1_Dual_Dex1_StackBox_Dataset
Viewer • Updated • 117k • 320 • 3
Unitree Robotics - G1_Dex3_datasets
Unitree G1 humanoid robot Dex3 advanced dexterity datasets for fine-grained manipulation tasks
-
unitreerobotics/G1_Dex3_BlockStacking_Dataset
Viewer • Updated • 281k • 873 • 2 -
unitreerobotics/G1_Dex3_CameraPackaging_Dataset
Viewer • Updated • 256k • 656 -
unitreerobotics/G1_Dex3_GraspSquare_Dataset
Viewer • Updated • 281k • 876 -
unitreerobotics/G1_Dex3_ObjectPlacement_Dataset
Viewer • Updated • 98.3k • 681 • 4
Unitree UnifoLM WMA - World Model Agent for Robotics
Unitree UnifoLM World Model Agent for robot learning, action prediction, and embodied AI planning
LeRobot Pi0.5 - Robotics Foundation Model v0.5
Hugging Face LeRobot Pi0.5 intermediate robotics model with improved action generation capabilities
LeRobot SmolVLA - Compact Vision-Language-Action
Hugging Face LeRobot SmolVLA compact vision-language-action model for efficient robot control
Hugging Face - LeRobot - Behavior 1K
Hugging Face LeRobot Behavior-1K large-scale robotics benchmark for diverse manipulation tasks
Atlas RL - Intelligent Architecture Reinforcement Learning
Atlas Reinforcement Learning - How to put the Intelligence into the Architecture?!
Dual RTX 6000 Build - 96GB VRAM Optimized LLMs
Optimized LLMs for dual NVIDIA RTX 6000 GPU setup - 96GB VRAM configurations for local inference
LeRobot Pi0Fast - Real-Time Robotics Inference
Hugging Face LeRobot Pi0Fast optimized robotics models for real-time inference and fast action generation
Google Embedding Gemma - Text Embeddings for RAG
Google Embedding Gemma models for semantic search, RAG applications, and text embeddings
Nvidia Thor + Rasberry + Oak 4D Dual Build
NVIDIA Thor SoC with Raspberry Pi and OAK-4D stereo camera for edge robotics and embodied AI
Qwen3 VL Embeddings - Multimodal Vector Search
Qwen3 Vision-Language embedding models for multimodal RAG, semantic search, and vector databases
NVIDIA Nemotron Content Safety - Toxicity Detection
NVIDIA Llama Nemotron content safety models for toxicity detection and safe AI deployment
NVIDIA Clara Biology - Genomics & Protein AI
NVIDIA Clara biology models for genomics, protein structure, and computational biology research
NVIDIA Clara Medical - Clinical AI & Radiology
NVIDIA Clara medical AI for clinical NLP, radiology analysis, and healthcare decision support
NVIDIA Nemotron Embeddings - RAG & Vector Search
NVIDIA Nemotron embedding models for RAG, semantic search, and vector database applications
-
nvidia/llama-nemotron-embed-vl-1b-v2
Feature Extraction • 2B • Updated • 15k • 17 -
nvidia/llama-nemotron-embed-1b-v2
Feature Extraction • 1B • Updated • 22k • 32 -
nvidia/llama-embed-nemotron-8b
Feature Extraction • 8B • Updated • 470k • 121 -
nvidia/NV-Embed-v2
Feature Extraction • 8B • Updated • 24.2k • 498
DiT - Diffusion Transformer for Video & Audio Gen
Diffusion Transformer models for multimodal video and audio generation, synthesis, and editing
-
Lightricks/LTX-2
Image-to-Video • Updated • 1.46M • • 1.11k -
Masked Audio Generation using a Single Non-Autoregressive Transformer
Paper • 2401.04577 • Published • 44 -
LTX-2: Efficient Joint Audio-Visual Foundation Model
Paper • 2601.03233 • Published • 121 -
YOLO-World: Real-Time Open-Vocabulary Object Detection
Paper • 2401.17270 • Published • 43
NVIDIA Nemotron Cascade - Multi-Stage LLM Inference
NVIDIA Nemotron Cascade for multi-stage inference, model routing, and efficient LLM deployment
-
nvidia/Nemotron-Cascade-8B
Text Generation • 8B • Updated • 5.95k • 55 -
nvidia/Nemotron-Cascade-8B-Thinking
Text Generation • 8B • Updated • 1.09k • 34 -
nvidia/Nemotron-Cascade-14B-Thinking
Text Generation • 15B • Updated • 1.95k • 65 -
nvidia/Nemotron-Cascade-8B-Intermediate-ckpts
Text Generation • Updated • 10
Google Gemma 3 LiteRT - Mobile & Edge Optimized
Google Gemma 3 LiteRT models optimized for TensorFlow Lite runtime and mobile edge deployment
NVIDIA Cosmos Transfer 2.5 - Style & Domain Transfer
NVIDIA Cosmos 2.5 Transfer models for domain adaptation, style transfer, and video generation
Robotics - Foundation Models for Embodied AI
Robotics foundation models, datasets, and research for embodied AI, manipulation, and autonomous systems
NVIDIA NeMo Gym - RL Agent Training Datasets
NVIDIA Nemotron reinforcement learning datasets from NeMo Gym for agent training and RLHF
-
nvidia/Nemotron-RL-knowledge-web_search-mcqa
Viewer • Updated • 2.93k • 263 • 7 -
nvidia/Nemotron-RL-agent-workplace_assistant
Viewer • Updated • 1.8k • 223 • 12 -
nvidia/Nemotron-RL-instruction_following
Preview • Updated • 168 • 9 -
nvidia/Nemotron-RL-instruction_following-structured_outputs
Viewer • Updated • 9.95k • 205 • 25
NVIDIA Nemotron RAG Datasets - Retrieval Training
NVIDIA Nemotron RAG datasets for retrieval-augmented generation, document QA, and knowledge grounding
Google Gemma 3N - Mobile multimordal Edition
Google Gemma 3N mobile multimodal models for on-device vision-language tasks and efficient edge deployment
Small Thinking - Compact Reasoning Models for Edge
Compact reasoning models for efficient chain-of-thought inference on resource-constrained devices
Self-Correcting Delta Transformer - Adaptive LLMs
Self-Correcting Delta Transformer - DDL provides the Hardware mechanism (The Erazor), NL solves the software problem.
Meta SAM - Segment Anything Models (Image & Audio)
Meta SAM Segment Anything models for zero-shot image segmentation, object detection, and visual understanding
Edge LLMs - Ultra-Compact High-Performance Models
Ultra-compact LLMs for edge deployment: sub-1B parameter models with strong performance for IoT and mobile
NVIDIA Nemotron Personas - Regional Character Data
NVIDIA Nemotron persona datasets for character AI, personality modeling, and conversational agent training
NVIDIA Nemotron Reward - RLHF & Alignment Models
LLM as a judge
-
nvidia/Llama-3.3-Nemotron-70B-Reward-Principle
Text Generation • 71B • Updated • 97 • 6 -
nvidia/Qwen3-Nemotron-32B-GenRM-Principle
Text Generation • 33B • Updated • 315 • 11 -
nvidia/Qwen3-Nemotron-32B-RLBFF
Text Generation • 33B • Updated • 58 • 27 -
nvidia/Qwen3-Nemotron-8B-BRRM
Text Generation • Updated • 131 • 8
Embeddings - Semantic Search & RAG Vector Models
Text and multimodal embedding models for semantic search, RAG pipelines, and vector similarity applications
Edge Translation - On-Device Multilingual NLP
Edge-optimized translation models for on-device multilingual NLP and low-latency language translation
Qwen Long Reasoning - Extended Context CoT Models
Qwen long-context reasoning models for extended chain-of-thought, complex problem solving, and mathematical reasoning
OCR Models - Optical Character Recognition & Text Extraction
Optical Character Recognition models for text extraction, document digitization, and scene text detection
-
PaddlePaddle/PaddleOCR-VL
Image-Text-to-Text • 1.0B • Updated • 12.4k • 1.49k -
baidu/ERNIE-4.5-0.3B-Paddle
Text Generation • 0.4B • Updated • 75 • 19 -
baidu/ERNIE-4.5-21B-A3B-Paddle
Text Generation • 22B • Updated • 65 • 13 -
ibm-granite/granite-docling-258M
Image-Text-to-Text • 0.3B • Updated • 209k • 1.09k
IQuest LoopCoder - Iterative Code Generation Models
Iquenst LoopCoder models for iterative code generation, self-refinement, and automated debugging
ASR Models - Automatic Speech Recognition & Transcription
Automatic Speech Recognition models for transcription, voice AI, and multilingual speech-to-text
Mobile App AI - On-Device Agents & Function Calling
Mobile app engine models for on-device AI, app development automation, and mobile-first ML
Hybrid Attention - Efficient Transformer Architectures
Hybrid attention models combining local and global attention for efficient long-context processing
Datasets Pretraining - Nemotron V3
Mamba/Transformers Combo Hybride
Byte Level Models - Tokenizer-Free Language Models
Byte-level language models for tokenizer-free NLP, multilingual text, and raw byte processing
Video Analysis - Action Recognition & Understanding
Video analysis models for action recognition, temporal understanding, and video content classification
Diffusion LLMs - Non-Autoregressive Text Generation
Diffusion-based language models for text generation, discrete diffusion, and non-autoregressive NLP
Video Generation - Text-to-Video & AI Synthesis
Video generation models for text-to-video, image-to-video, and AI video synthesis
Graphics AI - Visual Computing & Image Synthesis
Graphics and visual computing models for rendering, image synthesis, and computer graphics AI
Meta VL-JEPA - Vision-Language Prediction Models
Meta VL-JEPA Vision-Language Joint Embedding Predictive Architecture for video understanding
Google Gemma Scope 2 - Neuronpedia
Google Gemma Scope 2: JumpReLU SAEs for Gemma 3 interpretability. 270M PT/IT, 1B PT variants. Neuronpedia integration. Mechanistic analysis.
Google Gemma - Quantized
Quantized Gemma 3 models: QAT for efficient deployment. Gemma-3-27B-IT Q4. Low memory, fast inference. Edge & production-ready LLMs.
Reward Models
NVIDIA Nemotron reward models: 340B, 8B BRRM, 70B/32B principle-based. RLHF training, preference learning, AI alignment research.
Google T5 Gemma 2
T5Gemma-2 encoder-decoder models: 270M, 1B, 4B sizes. Text-to-text, summarization, translation. Google's architecture for structured generation.
-
google/t5gemma-2-270m-270m
Image-Text-to-Text • 0.8B • Updated • 20.5k • 169 -
google/t5gemma-2-4b-4b
Image-Text-to-Text • 9B • Updated • 11.8k • 136 -
google/t5gemma-2-1b-1b
Image-Text-to-Text • 2B • Updated • 14.4k • 67 -
T5Gemma 2: Seeing, Reading, and Understanding Longer
Paper • 2512.14856 • Published • 1
Nvidia - Nemotron - Mamba/Transformers Combo Hybride
Hybrid Mamba + Transformer architectures. NVIDIA Nemotron-3-Nano-30B-A3B (32B). BF16 & GGUF. Efficient long-context & in-context learning.
-
nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16
Text Generation • 32B • Updated • 311k • 574 -
bartowski/nvidia_Nemotron-3-Nano-30B-A3B-GGUF
Text Generation • 32B • Updated • 7.98k • 9 -
nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-Base-BF16
Text Generation • 32B • Updated • 28.2k • 91 -
nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-FP8
Text Generation • 32B • Updated • 773k • • 247
NVIDIA GR00T N1 - Humanoid Robotics Foundation Models
NVIDIA GR00T N1 humanoid robotics foundation models for embodied AI, manipulation, and robot learning
Google TranslateGemma - 55 Language Translation Models
Google TranslateGemma multilingual translation models supporting 55 languages for neural machine translation and cross-lingual NLP
NVIDIA Nemotron PII - Privacy & Data Protection Dataset
NVIDIA Nemotron PII dataset for personally identifiable information detection and privacy-aware NLP
Unitree Z1 Arm - Dual Dexterity Manipulation Data
Unitree Z1 robotic arm datasets for manipulation learning, grasp planning, and arm control training
-
unitreerobotics/Z1_Dual_Dex1_CleanupPencils_Dataset
Viewer • Updated • 133k • 308 • 2 -
unitreerobotics/Z1_Dual_Dex1_FoldClothes_Dataset
Viewer • Updated • 293k • 237 • 2 -
unitreerobotics/Z1_Dual_Dex1_PourCoffee_Dataset
Viewer • Updated • 443k • 531 • 1 -
unitreerobotics/Z1_Dual_Dex1_StackBox_Dataset
Viewer • Updated • 117k • 320 • 3
Unitree G1 Dex1 - Humanoid Robot Dexterity Datasets
Unitree G1 humanoid robot Dex1 dexterity datasets with mounted camera for manipulation learning
Unitree Robotics - G1_Dex3_datasets
Unitree G1 humanoid robot Dex3 advanced dexterity datasets for fine-grained manipulation tasks
-
unitreerobotics/G1_Dex3_BlockStacking_Dataset
Viewer • Updated • 281k • 873 • 2 -
unitreerobotics/G1_Dex3_CameraPackaging_Dataset
Viewer • Updated • 256k • 656 -
unitreerobotics/G1_Dex3_GraspSquare_Dataset
Viewer • Updated • 281k • 876 -
unitreerobotics/G1_Dex3_ObjectPlacement_Dataset
Viewer • Updated • 98.3k • 681 • 4
Unitree G1 BrainCo - Grasping & Manipulation Data
Unitree G1 humanoid robot BrainCo datasets for grasping, object manipulation, and dexterous hand training
-
unitreerobotics/G1_Brainco_GraspOreo_Dataset
Viewer • Updated • 235k • 533 -
unitreerobotics/G1_Brainco_GraspRubiksCube_Dataset
Viewer • Updated • 221k • 541 • 1 -
unitreerobotics/G1_Brainco_PickApple_Dataset
Viewer • Updated • 154k • 140 -
unitreerobotics/G1_Brainco_PickCharger_Dataset
Viewer • Updated • 217k • 527
Unitree UnifoLM WMA - World Model Agent for Robotics
Unitree UnifoLM World Model Agent for robot learning, action prediction, and embodied AI planning
Hugging Face - LeRobot - Pi0 (Old Version)
Hugging Face LeRobot Pi0 legacy version - archived robotics model for reference and compatibility
LeRobot Pi0.5 - Robotics Foundation Model v0.5
Hugging Face LeRobot Pi0.5 intermediate robotics model with improved action generation capabilities
Hugging Face - LeRobot - Open X-Embodiment
Open X-Embodiment robotics datasets: cross-platform robot learning for DROID, Kuka, TACO, JACO
LeRobot SmolVLA - Compact Vision-Language-Action
Hugging Face LeRobot SmolVLA compact vision-language-action model for efficient robot control
LeRobot Pi0 - HuggingFace Robotics Foundation Model
Hugging Face LeRobot Pi0 foundation model for robotics: manipulation, navigation, and embodied AI
Hugging Face - LeRobot - Behavior 1K
Hugging Face LeRobot Behavior-1K large-scale robotics benchmark for diverse manipulation tasks
LeRobot XVLA - Cross-Embodiment Vision-Language-Action
Hugging Face LeRobot XVLA cross-embodiment vision-language-action models for universal robot control
Atlas RL - Intelligent Architecture Reinforcement Learning
Atlas Reinforcement Learning - How to put the Intelligence into the Architecture?!
Hyper Graph Reasoning - Knowledge Graphs for AI Agents
Higher-order knowledge representations and hypergraph reasoning for agentic AI and scientific discovery
Dual RTX 6000 Build - 96GB VRAM Optimized LLMs
Optimized LLMs for dual NVIDIA RTX 6000 GPU setup - 96GB VRAM configurations for local inference
NVIDIA Nemotron Orchestrator - Multi-Model Routing
NVIDIA Nemotron Orchestrator 8B for multi-model coordination, task routing, and agentic workflows
LeRobot Pi0Fast - Real-Time Robotics Inference
Hugging Face LeRobot Pi0Fast optimized robotics models for real-time inference and fast action generation
Meta RoBERTa - Pretrained NLP & Text Classification
Meta RoBERTa pretrained language models for NLP tasks: classification, NER, sentiment analysis
-
FacebookAI/roberta-base
Fill-Mask • 0.1B • Updated • 9.73M • • 551 -
FacebookAI/xlm-roberta-large-finetuned-conll03-german
Token Classification • Updated • 4.6k • • 14 -
FacebookAI/xlm-roberta-large-finetuned-conll02-spanish
Fill-Mask • 0.6B • Updated • 78 • 2 -
FacebookAI/xlm-roberta-large
Fill-Mask • 0.6B • Updated • 3.48M • • 487
Google Embedding Gemma - Text Embeddings for RAG
Google Embedding Gemma models for semantic search, RAG applications, and text embeddings
NVIDIA Physical AI - Autonomous Vehicles & Robotics
NVIDIA Physical AI models for robotics, embodied intelligence, and real-world interaction
Nvidia Thor + Rasberry + Oak 4D Dual Build
NVIDIA Thor SoC with Raspberry Pi and OAK-4D stereo camera for edge robotics and embodied AI
Qwen3 VL Reranker - Multimodal RAG Ranking Models
Qwen3 Vision-Language reranker models for RAG pipelines and multimodal document retrieval
Qwen3 VL Embeddings - Multimodal Vector Search
Qwen3 Vision-Language embedding models for multimodal RAG, semantic search, and vector databases
Facebook/Meta - Research Plan Dataset
Meta Research Plan datasets for AI research planning, scientific reasoning, and agent workflows
NVIDIA Nemotron Content Safety - Toxicity Detection
NVIDIA Llama Nemotron content safety models for toxicity detection and safe AI deployment
NVIDIA Clara Medical - Healthcare & Clinical NLP
NVIDIA Clara medical AI models for healthcare, clinical NLP, and medical imaging analysis
NVIDIA Clara Biology - Genomics & Protein AI
NVIDIA Clara biology models for genomics, protein structure, and computational biology research
NVIDIA Clara Molecular - Drug Discovery & Chemistry
NVIDIA Clara molecular models for drug discovery, molecular property prediction, and computational chemistry
NVIDIA Clara Medical - Clinical AI & Radiology
NVIDIA Clara medical AI for clinical NLP, radiology analysis, and healthcare decision support
Nvidia Nemotron RAG - Reranking
NVIDIA Nemotron reranking models for RAG pipelines, search result optimization, and document ranking
NVIDIA Nemotron Embeddings - RAG & Vector Search
NVIDIA Nemotron embedding models for RAG, semantic search, and vector database applications
-
nvidia/llama-nemotron-embed-vl-1b-v2
Feature Extraction • 2B • Updated • 15k • 17 -
nvidia/llama-nemotron-embed-1b-v2
Feature Extraction • 1B • Updated • 22k • 32 -
nvidia/llama-embed-nemotron-8b
Feature Extraction • 8B • Updated • 470k • 121 -
nvidia/NV-Embed-v2
Feature Extraction • 8B • Updated • 24.2k • 498
NVIDIA Alpamayo-R1 - Reasoning & Physical AI Models
NVIDIA Alpamayo-R1 reasoning model for complex problem solving, mathematical reasoning, and chain-of-thought
-
nvidia/Alpamayo-R1-10B
Robotics • 11B • Updated • 25.7k • 305 -
nvidia/PhysicalAI-Autonomous-Vehicles
Updated • 156k • 695 -
nvidia/PhysicalAI-Autonomous-Vehicles-NuRec
Updated • 11.8k • 110 -
Alpamayo-R1: Bridging Reasoning and Action Prediction for Generalizable Autonomous Driving in the Long Tail
Paper • 2511.00088 • Published • 3
DiT - Diffusion Transformer for Video & Audio Gen
Diffusion Transformer models for multimodal video and audio generation, synthesis, and editing
-
Lightricks/LTX-2
Image-to-Video • Updated • 1.46M • • 1.11k -
Masked Audio Generation using a Single Non-Autoregressive Transformer
Paper • 2401.04577 • Published • 44 -
LTX-2: Efficient Joint Audio-Visual Foundation Model
Paper • 2601.03233 • Published • 121 -
YOLO-World: Real-Time Open-Vocabulary Object Detection
Paper • 2401.17270 • Published • 43
NVIDIA Nemotron Speech - ASR & Text-to-Speech
NVIDIA Nemotron speech models for ASR, text-to-speech, and voice AI applications
-
nvidia/nemotron-speech-streaming-en-0.6b
Automatic Speech Recognition • Updated • 5.78k • 392 -
nvidia/parakeet-tdt-0.6b-v3
Automatic Speech Recognition • Updated • 72.1k • 554 -
nvidia/parakeet_realtime_eou_120m-v1
Updated • 575 • 106 -
nvidia/multitalker-parakeet-streaming-0.6b-v1
Automatic Speech Recognition • Updated • 615 • 63
NVIDIA Nemotron Cascade - Multi-Stage LLM Inference
NVIDIA Nemotron Cascade for multi-stage inference, model routing, and efficient LLM deployment
-
nvidia/Nemotron-Cascade-8B
Text Generation • 8B • Updated • 5.95k • 55 -
nvidia/Nemotron-Cascade-8B-Thinking
Text Generation • 8B • Updated • 1.09k • 34 -
nvidia/Nemotron-Cascade-14B-Thinking
Text Generation • 15B • Updated • 1.95k • 65 -
nvidia/Nemotron-Cascade-8B-Intermediate-ckpts
Text Generation • Updated • 10
OpenAI GPT-OSS - Steering Vectors & SAE Research
Open-source GPT models with steering vectors for controllable generation and behavior modification
Google Gemma 3 LiteRT - Mobile & Edge Optimized
Google Gemma 3 LiteRT models optimized for TensorFlow Lite runtime and mobile edge deployment
NVIDIA Cosmos Reason 2 - World Model Reasoning
NVIDIA Cosmos 2 Reason models for world model reasoning, physics simulation, and causal understanding
NVIDIA Cosmos Transfer 2.5 - Style & Domain Transfer
NVIDIA Cosmos 2.5 Transfer models for domain adaptation, style transfer, and video generation
NVIDIA Cosmos 2 - Cosmos-Predict 2.5
NVIDIA Cosmos 2.5 Predict models for world simulation, future frame prediction, and physical AI
Robotics - Foundation Models for Embodied AI
Robotics foundation models, datasets, and research for embodied AI, manipulation, and autonomous systems
Edge & Smartphone - On-Device Mobile AI Models
On-device AI models optimized for smartphone deployment: mobile LLMs, edge inference, and efficient architectures
NVIDIA NeMo Gym - RL Agent Training Datasets
NVIDIA Nemotron reinforcement learning datasets from NeMo Gym for agent training and RLHF
-
nvidia/Nemotron-RL-knowledge-web_search-mcqa
Viewer • Updated • 2.93k • 263 • 7 -
nvidia/Nemotron-RL-agent-workplace_assistant
Viewer • Updated • 1.8k • 223 • 12 -
nvidia/Nemotron-RL-instruction_following
Preview • Updated • 168 • 9 -
nvidia/Nemotron-RL-instruction_following-structured_outputs
Viewer • Updated • 9.95k • 205 • 25
NVIDIA Nemotron Safety - AI Alignment Datasets
NVIDIA Nemotron safety datasets for AI alignment, content moderation, and responsible AI training
-
nvidia/Nemotron-AIQ-Agentic-Safety-Dataset-1.0
Viewer • Updated • 10.8k • 428 • 10 -
nvidia/Nemotron-Content-Safety-Reasoning-Dataset
Preview • Updated • 77 • 5 -
nvidia/Aegis-AI-Content-Safety-Dataset-2.0
Viewer • Updated • 33.4k • 3k • 72 -
nvidia/Nemotron-Content-Safety-Audio-Dataset
Viewer • Updated • 1.93k • 1.34k • 3
NVIDIA Nemotron RAG Datasets - Retrieval Training
NVIDIA Nemotron RAG datasets for retrieval-augmented generation, document QA, and knowledge grounding
NVIDIA Nemotron VLM - Vision-Language Training Data
NVIDIA Nemotron vision-language datasets for multimodal training, image understanding, and VLM finetuning
Google Gemma 3N - Mobile multimordal Edition
Google Gemma 3N mobile multimodal models for on-device vision-language tasks and efficient edge deployment
Deep Thinking - Extended Chain-of-Thought Reasoning
Deep thinking and reasoning models for extended chain-of-thought, deliberative alignment, and complex problem solving
Small Thinking - Compact Reasoning Models for Edge
Compact reasoning models for efficient chain-of-thought inference on resource-constrained devices
Small Coders - Lightweight Code Generation Models
Lightweight code generation models for edge deployment, IDE integration, and fast code completion
Self-Correcting Delta Transformer - Adaptive LLMs
Self-Correcting Delta Transformer - DDL provides the Hardware mechanism (The Erazor), NL solves the software problem.
YOLO - Real-Time Object Detection Models
YOLO object detection models for real-time computer vision, autonomous systems, and video analytics
Meta SAM - Segment Anything Models (Image & Audio)
Meta SAM Segment Anything models for zero-shot image segmentation, object detection, and visual understanding
Affordable Coding APIs - Cost-Effective LLM Endpoints
Cost-effective coding API providers and affordable LLM endpoints for development and prototyping
Edge LLMs - Ultra-Compact High-Performance Models
Ultra-compact LLMs for edge deployment: sub-1B parameter models with strong performance for IoT and mobile
RLM - Neuro-Symbolic Architecture - Reasonig Traces
Inference Wrapper - Models: Root LLM (The Architect) + Python REPL (The Engine) + Sub LLMs (The Workers) Spaned by querys
NVIDIA Nemotron Personas - Regional Character Data
NVIDIA Nemotron persona datasets for character AI, personality modeling, and conversational agent training
NVIDIA Nemotron Post-Training - RLHF & SFT Data
NVIDIA Nemotron post-training datasets for RLHF, instruction tuning, and alignment fine-tuning
NVIDIA Nemotron Reward - RLHF & Alignment Models
LLM as a judge
-
nvidia/Llama-3.3-Nemotron-70B-Reward-Principle
Text Generation • 71B • Updated • 97 • 6 -
nvidia/Qwen3-Nemotron-32B-GenRM-Principle
Text Generation • 33B • Updated • 315 • 11 -
nvidia/Qwen3-Nemotron-32B-RLBFF
Text Generation • 33B • Updated • 58 • 27 -
nvidia/Qwen3-Nemotron-8B-BRRM
Text Generation • Updated • 131 • 8
NVIDIA Nemotron Pre-Training - Foundation Model Data
NVIDIA Nemotron pre-training datasets for large language model training and foundation model development
Embeddings - Semantic Search & RAG Vector Models
Text and multimodal embedding models for semantic search, RAG pipelines, and vector similarity applications
Topological Transformer - Deepseek
: Manifold-Constrained Hyper-Connections
Edge Translation - On-Device Multilingual NLP
Edge-optimized translation models for on-device multilingual NLP and low-latency language translation
Deep Research - Autonomous AI Literature Review
Deep research AI agents and models for autonomous literature review, scientific reasoning, and knowledge synthesis
Qwen Long Reasoning - Extended Context CoT Models
Qwen long-context reasoning models for extended chain-of-thought, complex problem solving, and mathematical reasoning
PP-StructureV3 - Document Analysis & Table OCR
PaddlePaddle PP-StructureV3 for document analysis, table recognition, and intelligent document processing
OCR Models - Optical Character Recognition & Text Extraction
Optical Character Recognition models for text extraction, document digitization, and scene text detection
-
PaddlePaddle/PaddleOCR-VL
Image-Text-to-Text • 1.0B • Updated • 12.4k • 1.49k -
baidu/ERNIE-4.5-0.3B-Paddle
Text Generation • 0.4B • Updated • 75 • 19 -
baidu/ERNIE-4.5-21B-A3B-Paddle
Text Generation • 22B • Updated • 65 • 13 -
ibm-granite/granite-docling-258M
Image-Text-to-Text • 0.3B • Updated • 209k • 1.09k
Circuit Sparsity - Neural Network Interpretability
Circuit sparsity research for neural network pruning, mechanistic interpretability, and efficient model compression
IQuest LoopCoder - Iterative Code Generation Models
Iquenst LoopCoder models for iterative code generation, self-refinement, and automated debugging
Text to Motion - Human Animation & Gesture AI
Text-to-motion generation models for human animation, gesture synthesis, and motion capture AI
ASR Models - Automatic Speech Recognition & Transcription
Automatic Speech Recognition models for transcription, voice AI, and multilingual speech-to-text
TTS - Text-to-Speech & Voice Synthesis Models
Text-to-Speech models for voice synthesis, neural TTS, and natural language audio generation
Mobile App AI - On-Device Agents & Function Calling
Mobile app engine models for on-device AI, app development automation, and mobile-first ML
Audio Segmenting - Meta SAM 3 Audio
Audio segmentation models based on Meta SAM architecture for sound separation and audio understanding
Hybrid Attention - Efficient Transformer Architectures
Hybrid attention models combining local and global attention for efficient long-context processing
NVIDIA Nemotron V3 - Post-Training Datasets
Mamba/Transformers Combo Hybride
Datasets Pretraining - Nemotron V3
Mamba/Transformers Combo Hybride
Open Source AI - Fully Open Weights & Training Data
Fully open-source AI models with permissive licenses for commercial use and research
Byte Level Models - Tokenizer-Free Language Models
Byte-level language models for tokenizer-free NLP, multilingual text, and raw byte processing
Image to 3D - Single-Image 3D Reconstruction
Image-to-3D generation models for single-image 3D reconstruction, mesh generation, and 3D asset creation
Video Analysis - Action Recognition & Understanding
Video analysis models for action recognition, temporal understanding, and video content classification
Deep Research Agents - Specialized Search & Reasoning
Specialized deep research models for domain-specific scientific reasoning and literature analysis
Diffusion LLMs - Non-Autoregressive Text Generation
Diffusion-based language models for text generation, discrete diffusion, and non-autoregressive NLP
IBM Granite - Enterprise AI & Code Generation
IBM Granite foundation models for enterprise AI, code generation, and multilingual NLP tasks
Video Generation - Text-to-Video & AI Synthesis
Video generation models for text-to-video, image-to-video, and AI video synthesis
Small OCR - Lightweight Text Recognition for Edge
Lightweight OCR models for edge deployment, mobile text recognition, and efficient document processing
Graphics AI - Visual Computing & Image Synthesis
Graphics and visual computing models for rendering, image synthesis, and computer graphics AI
Hierarchical RL - Multi-Level Decision Making
Hierarchical reinforcement learning models for multi-level decision making and complex task decomposition
Meta VL-JEPA - Vision-Language Prediction Models
Meta VL-JEPA Vision-Language Joint Embedding Predictive Architecture for video understanding
Bread & Butter - Top Production-Ready LLMs 2025
Top LLMs 2025: ZAI GLM-4.7 (358B) & Moonshot Kimi-K2-Thinking. Next-gen reasoning, code, multilingual. State-of-the-art performance. Production-ready.
Google Gemma Scope 2 - Neuronpedia
Google Gemma Scope 2: JumpReLU SAEs for Gemma 3 interpretability. 270M PT/IT, 1B PT variants. Neuronpedia integration. Mechanistic analysis.
Haddock Custom Sparse Autodecoders
Custom JumpReLU Sparse Autoencoders for mechanistic interpretability. T5Gemma-2 SAEs across all layers. AI safety & interpretability research.
Google Gemma - Quantized
Quantized Gemma 3 models: QAT for efficient deployment. Gemma-3-27B-IT Q4. Low memory, fast inference. Edge & production-ready LLMs.
Nvidia Nemo-Gym
NVIDIA Nemotron RL datasets for AI agent training. Web search, workplace tasks, instruction following, structured outputs. RLHF & alignment research.
-
nvidia/Nemotron-RL-knowledge-web_search-mcqa
Viewer • Updated • 2.93k • 263 • 7 -
nvidia/Nemotron-RL-agent-workplace_assistant
Viewer • Updated • 1.8k • 223 • 12 -
nvidia/Nemotron-RL-instruction_following
Preview • Updated • 168 • 9 -
nvidia/Nemotron-RL-instruction_following-structured_outputs
Viewer • Updated • 9.95k • 205 • 25
Reward Models
NVIDIA Nemotron reward models: 340B, 8B BRRM, 70B/32B principle-based. RLHF training, preference learning, AI alignment research.
Trained
Custom-trained models by mindchain: reward models & SAEs. Haddock Reward Mini (8B), function-specific SAEs. AI interpretability & alignment research.
Google T5 Gemma 2
T5Gemma-2 encoder-decoder models: 270M, 1B, 4B sizes. Text-to-text, summarization, translation. Google's architecture for structured generation.
-
google/t5gemma-2-270m-270m
Image-Text-to-Text • 0.8B • Updated • 20.5k • 169 -
google/t5gemma-2-4b-4b
Image-Text-to-Text • 9B • Updated • 11.8k • 136 -
google/t5gemma-2-1b-1b
Image-Text-to-Text • 2B • Updated • 14.4k • 67 -
T5Gemma 2: Seeing, Reading, and Understanding Longer
Paper • 2512.14856 • Published • 1
Auto Decoders
JumpReLU SAEs for Gemma 3 interpretability. EleutherAI models, DeepSeek-R1, Pythia SAEs. Mechanistic interpretability & AI safety research.
Nvidia - Nemotron - Mamba/Transformers Combo Hybride
Hybrid Mamba + Transformer architectures. NVIDIA Nemotron-3-Nano-30B-A3B (32B). BF16 & GGUF. Efficient long-context & in-context learning.
-
nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16
Text Generation • 32B • Updated • 311k • 574 -
bartowski/nvidia_Nemotron-3-Nano-30B-A3B-GGUF
Text Generation • 32B • Updated • 7.98k • 9 -
nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-Base-BF16
Text Generation • 32B • Updated • 28.2k • 91 -
nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-FP8
Text Generation • 32B • Updated • 773k • • 247
Google FunctionGemma (Gemma 3)
Function calling: Google FunctionGemma-270m-IT & Mobile Actions dataset (9.65k). Efficient tool use in small LMs. AI agent development.