Skala Accurate and scalable exchange-correlation with deep learning microsoft/skala-1.1 Updated about 23 hours ago • 299 • 6 Accurate and scalable exchange-correlation with deep learning Paper • 2506.14665 • Published 2 days ago • 2 Accurate Chemistry Collection: Coupled cluster atomization energies for broad chemical space Paper • 2506.14492 • Published Jun 17, 2025 • 3 microsoft/skala-baselines Updated 9 days ago • 68 • 4
Accurate and scalable exchange-correlation with deep learning Paper • 2506.14665 • Published 2 days ago • 2
Accurate Chemistry Collection: Coupled cluster atomization energies for broad chemical space Paper • 2506.14492 • Published Jun 17, 2025 • 3
VibeVoice Frontier Text-to-Speech Models https://microsoft.github.io/VibeVoice/ microsoft/VibeVoice-1.5B Text-to-Speech • 3B • Updated Jan 22 • 147k • 2.34k microsoft/VibeVoice-Realtime-0.5B Text-to-Speech • 1B • Updated Dec 12, 2025 • 1.2M • 1.2k VibeVoice Technical Report Paper • 2508.19205 • Published Aug 26, 2025 • 165 microsoft/VibeVoice-ASR Automatic Speech Recognition • 9B • Updated Jan 27 • 734k • 1.05k
Dayhoff Atlas The models and datasets that comprise the Dayhoff Atlas microsoft/Dayhoff Viewer • Updated 21 days ago • 1.77B • 2.71k • 11 microsoft/Dayhoff-170m-UR50 Text Generation • 0.2B • Updated Jan 16 • 95 • 5 microsoft/Dayhoff-170m-UR90 Text Generation • 0.2B • Updated Jan 26 • 972 • 1 microsoft/Dayhoff-170m-GR Text Generation • 0.2B • Updated Jan 26 • 2.21k • 2
Paza Paza is a collection of speech models & benchmarks for low resource languages by the Microsoft Research Africa - Nairobi Lab Running Agents 15 PazaBench 🥇 15 ASR Leaderboard for low resource languages microsoft/paza-Phi-4-multimodal-instruct Automatic Speech Recognition • 6B • Updated Feb 4 • 123 • 3 microsoft/paza-whisper-large-v3-turbo Automatic Speech Recognition • 0.8B • Updated Feb 4 • 356 • 6
Phi-4 Phi-4 family of small language, multi-modal and reasoning models. microsoft/Phi-4-mini-flash-reasoning Text Generation • Updated Dec 10, 2025 • 866 • 275 microsoft/Phi-4-mini-reasoning Text Generation • Updated Dec 10, 2025 • 34.6k • 225 microsoft/Phi-4-reasoning Text Generation • Updated Nov 24, 2025 • 14.1k • 222 microsoft/Phi-4-reasoning-plus Text Generation • Updated Nov 24, 2025 • 21.9k • 338
Phi-1 Phi-1 family of small language models. microsoft/phi-1 Text Generation • 1B • Updated Nov 24, 2025 • 7.38k • 220 microsoft/phi-1_5 Text Generation • 1B • Updated Nov 24, 2025 • 82.6k • 1.36k Textbooks Are All You Need Paper • 2306.11644 • Published Jun 20, 2023 • 155 Textbooks Are All You Need II: phi-1.5 technical report Paper • 2309.05463 • Published Sep 11, 2023 • 91
Textbooks Are All You Need II: phi-1.5 technical report Paper • 2309.05463 • Published Sep 11, 2023 • 91
BitNet 🔥BitNet family of large language models (1-bit LLMs). microsoft/bitnet-b1.58-2B-4T Text Generation • 0.8B • Updated Dec 17, 2025 • 15.5k • 1.44k microsoft/bitnet-b1.58-2B-4T-bf16 Text Generation • 2B • Updated Dec 17, 2025 • 6.37k • 41 microsoft/bitnet-b1.58-2B-4T-gguf Text Generation • 2B • Updated Dec 17, 2025 • 69.3k • 267 BitNet b1.58 2B4T Technical Report Paper • 2504.12285 • Published Apr 16, 2025 • 85
LLM2CLIP LLM2CLIP makes SOTA pretrained CLIP modal more SOTA ever. microsoft/LLM2CLIP-EVA02-L-14-336 Zero-Shot Image Classification • Updated Nov 22, 2024 • 83 • 61 microsoft/LLM2CLIP-Openai-L-14-336 Zero-Shot Classification • 0.6B • Updated Nov 24, 2024 • 3.85k • 44 microsoft/LLM2CLIP-EVA02-B-16 Updated Feb 8, 2025 • 42 • 11 microsoft/LLM2CLIP-Openai-B-16 Zero-Shot Classification • 0.4B • Updated Nov 24, 2024 • 113 • 19
microsoft/LLM2CLIP-Openai-L-14-336 Zero-Shot Classification • 0.6B • Updated Nov 24, 2024 • 3.85k • 44
TAPEX TAPEX is the state-of-the-art table pre-training models which can be used for table-based question answering and table-based fact verification. TAPEX: Table Pre-training via Learning a Neural SQL Executor Paper • 2107.07653 • Published Jul 16, 2021 • 3 microsoft/tapex-large-finetuned-wtq Table Question Answering • 0.4B • Updated Jan 12, 2024 • 779 • 78 microsoft/tapex-base-finetuned-wikisql Table Question Answering • Updated Jan 24, 2023 • 899k • • 24 microsoft/tapex-large-sql-execution Table Question Answering • 0.4B • Updated Sep 15, 2023 • 66 • 18
TAPEX: Table Pre-training via Learning a Neural SQL Executor Paper • 2107.07653 • Published Jul 16, 2021 • 3
microsoft/tapex-large-finetuned-wtq Table Question Answering • 0.4B • Updated Jan 12, 2024 • 779 • 78
LayoutLM The LayoutLM series are Transformer encoders useful for document AI tasks such as invoice parsing, document image classification and DocVQA. microsoft/layoutlmv3-base 0.1B • Updated Apr 10, 2024 • 564k • 481 microsoft/layoutlmv2-base-uncased Updated Sep 16, 2022 • 602k • 67 microsoft/layoutlm-base-uncased 0.1B • Updated Apr 16, 2024 • 175k • 62 microsoft/layoutxlm-base Updated Sep 16, 2022 • 7.16k • 74
Orca The Orca family of LMs developed by Microsoft. microsoft/Orca-2-7b Text Generation • Updated Nov 22, 2023 • 2.22k • 224 microsoft/Orca-2-13b Text Generation • Updated Nov 22, 2023 • 3.13k • 667
GIT GIT (Generative Image-to-text Transformer) is a model useful for vision-language tasks such as image/video captioning and question answering. GIT: A Generative Image-to-text Transformer for Vision and Language Paper • 2205.14100 • Published May 27, 2022 • 2 microsoft/git-base Image-to-Text • 0.2B • Updated Apr 24, 2023 • 19.5k • 110 microsoft/git-large Image-to-Text • Updated Feb 8, 2023 • 567 • 18 microsoft/git-base-vqav2 Visual Question Answering • 0.2B • Updated Mar 9, 2024 • 254 • 21
GIT: A Generative Image-to-text Transformer for Vision and Language Paper • 2205.14100 • Published May 27, 2022 • 2
IFMs Industrial Foundation Models microsoft/LLaMA-2-7b-GTL-Delta Text Generation • 7B • Updated Aug 12, 2024 • 62 • 10 microsoft/LLaMA-2-13b-GTL-Delta Text Generation • 13B • Updated Aug 12, 2024 • 43 • 6
ChatBench ChatBench Datasets and Simulators (same prompt + fine-tuning set-up) from the ChatBench paper. microsoft/ChatBench Preview • Updated Apr 28, 2025 • 255 • 13 microsoft/chatbench-distilgpt2 Text Generation • 81.9M • Updated Aug 23, 2025 • 41 • 4 microsoft/chatbench-llama3-8b Updated Aug 23, 2025 • 17 • 6 microsoft/chatbench-mistral-7b Updated Aug 23, 2025 • 20 • 5
MediPhi A collection of SLMs based on Phi3.5-mini-instruct adapted to clinical natural language processing tasks: https://arxiv.org/abs/2505.10717 A Modular Approach for Clinical SLMs Driven by Synthetic Data with Pre-Instruction Tuning, Model Merging, and Clinical-Tasks Alignment Paper • 2505.10717 • Published May 15, 2025 • 5 microsoft/MediPhi-Instruct Text Generation • 4B • Updated Dec 15, 2025 • 2.96k • 66 microsoft/MediPhi Text Generation • 4B • Updated Dec 15, 2025 • 673 • 21 microsoft/MediPhi-PubMed Text Generation • 4B • Updated Dec 15, 2025 • 208 • 11
A Modular Approach for Clinical SLMs Driven by Synthetic Data with Pre-Instruction Tuning, Model Merging, and Clinical-Tasks Alignment Paper • 2505.10717 • Published May 15, 2025 • 5
NatureLM microsoft/NatureLM-8x7B 47B • Updated Jun 20, 2025 • 34 • 20 microsoft/NatureLM-8x7B-Inst 47B • Updated Jun 20, 2025 • 79 • 25
NextCoder NextCoder family of code-editing LMs developed with Selective Knowledge Transfer and its training data. microsoft/NextCoder-7B Text Generation • 8B • Updated Jun 12, 2025 • 473 • 32 microsoft/NextCoder-14B Text Generation • 15B • Updated Jun 12, 2025 • 355 • 18 microsoft/NextCoder-32B Text Generation • 33B • Updated Jun 12, 2025 • 141 • • 67 microsoft/NextCoderDataset Viewer • Updated Jul 8, 2025 • 381k • 542 • 55
Phi-3 Phi-3 family of small language and multi-modal models. Language models are available in short- and long-context lengths. microsoft/Phi-3.5-mini-instruct Text Generation • 4B • Updated Dec 10, 2025 • 735k • 973 microsoft/Phi-3.5-MoE-instruct Text Generation • Updated Dec 10, 2025 • 105k • 574 microsoft/Phi-3.5-vision-instruct Image-Text-to-Text • Updated Dec 10, 2025 • 1.52M • 731 microsoft/Phi-3-mini-4k-instruct Text Generation • 4B • Updated Dec 10, 2025 • 736k • 1.42k
Controllable Safety Alignment Artifacts for the paper "Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements" (https://arxiv.org/abs/2410.08968) Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements Paper • 2410.08968 • Published Oct 11, 2024 • 14 microsoft/CoSApien Viewer • Updated Aug 1, 2025 • 200 • 169 • 3 microsoft/CoSAlign-Test Viewer • Updated May 5, 2025 • 3.2k • 170 • 3 microsoft/CoSAlign-Train Viewer • Updated Aug 1, 2025 • 125k • 84 • 4
Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements Paper • 2410.08968 • Published Oct 11, 2024 • 14
MAI-DS-R1 MAI-DS-R1 is a DeepSeek-R1 reasoning model that has been post-trained by the Microsoft AI team. microsoft/MAI-DS-R1 Text Generation • Updated Dec 15, 2025 • 116 • 293 microsoft/MAI-DS-R1-FP8 Text Generation • 671B • Updated Dec 15, 2025 • 179 • 26
SpeechT5 The SpeechT5 framework consists of a shared seq2seq and six modal-specific (speech/text) pre/post-nets that can address a few audio-related tasks. SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing Paper • 2110.07205 • Published Oct 14, 2021 • 6 microsoft/speecht5_tts Text-to-Speech • Updated Nov 8, 2023 • 223k • 826 Runtime error Agents Featured 220 SpeechT5 Speech Synthesis Demo 👩 220 microsoft/speecht5_vc Audio-to-Audio • Updated Mar 22, 2023 • 2.09k • 111
SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing Paper • 2110.07205 • Published Oct 14, 2021 • 6
Table Transformer The Table Transformer (TATR) is a series of object detection models useful for table extraction from PDF images. microsoft/table-transformer-detection Object Detection • 28.8M • Updated Sep 6, 2023 • 3.21M • 415 microsoft/table-transformer-structure-recognition Object Detection • 28.8M • Updated Sep 6, 2023 • 1.27M • 213 microsoft/table-transformer-structure-recognition-v1.1-all Object Detection • 28.8M • Updated Nov 18, 2023 • 938k • 82 microsoft/table-transformer-structure-recognition-v1.1-fin Object Detection • 28.8M • Updated Nov 27, 2023 • 372 • 2
microsoft/table-transformer-structure-recognition Object Detection • 28.8M • Updated Sep 6, 2023 • 1.27M • 213
microsoft/table-transformer-structure-recognition-v1.1-all Object Detection • 28.8M • Updated Nov 18, 2023 • 938k • 82
microsoft/table-transformer-structure-recognition-v1.1-fin Object Detection • 28.8M • Updated Nov 27, 2023 • 372 • 2
Biomedical Models for biomedical research applications, such as radiology report generation and biomedical language understanding. microsoft/maira-2 Text Generation • 7B • Updated Aug 14, 2025 • 2.62k • 72 microsoft/rad-dino-maira-2 Image Feature Extraction • 86.6M • Updated Aug 22, 2024 • 23.3k • 24 microsoft/rad-dino Image Feature Extraction • 86.6M • Updated Oct 9, 2025 • 28.9k • 72 microsoft/radedit Updated Dec 8, 2025 • 30
UDOP UDOP is a general multimodal model for document AI Unifying Vision, Text, and Layout for Universal Document Processing Paper • 2212.02623 • Published Dec 5, 2022 • 12 microsoft/udop-large Image-Text-to-Text • 0.7B • Updated Dec 2, 2025 • 80.7k • 124 microsoft/udop-large-512 Image-Text-to-Text • 0.7B • Updated Dec 2, 2025 • 65 • 6 microsoft/udop-large-512-300k Image-Text-to-Text • 0.7B • Updated Dec 2, 2025 • 232 • 34
Unifying Vision, Text, and Layout for Universal Document Processing Paper • 2212.02623 • Published Dec 5, 2022 • 12
Florence Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks Paper • 2311.06242 • Published Nov 10, 2023 • 96 microsoft/Florence-2-large Image-Text-to-Text • 0.8B • Updated Aug 4, 2025 • 1.09M • 1.8k microsoft/Florence-2-base Image-Text-to-Text • 0.2B • Updated Aug 4, 2025 • 551k • 364 microsoft/Florence-2-large-ft Image-Text-to-Text • 0.8B • Updated Aug 4, 2025 • 34.1k • 384
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks Paper • 2311.06242 • Published Nov 10, 2023 • 96
MoCapAct Locomotion policies for hundreds of simulated humanoid locomotion clips and demonstration data for training them. microsoft/mocapact-models Updated Aug 17, 2024 • 10 microsoft/mocapact-data Updated Aug 17, 2024 • 115 • 5 MoCapAct: A Multi-Task Dataset for Simulated Humanoid Control Paper • 2208.07363 • Published Aug 15, 2022 • 2
MoCapAct: A Multi-Task Dataset for Simulated Humanoid Control Paper • 2208.07363 • Published Aug 15, 2022 • 2
Skala Accurate and scalable exchange-correlation with deep learning microsoft/skala-1.1 Updated about 23 hours ago • 299 • 6 Accurate and scalable exchange-correlation with deep learning Paper • 2506.14665 • Published 2 days ago • 2 Accurate Chemistry Collection: Coupled cluster atomization energies for broad chemical space Paper • 2506.14492 • Published Jun 17, 2025 • 3 microsoft/skala-baselines Updated 9 days ago • 68 • 4
Accurate and scalable exchange-correlation with deep learning Paper • 2506.14665 • Published 2 days ago • 2
Accurate Chemistry Collection: Coupled cluster atomization energies for broad chemical space Paper • 2506.14492 • Published Jun 17, 2025 • 3
ChatBench ChatBench Datasets and Simulators (same prompt + fine-tuning set-up) from the ChatBench paper. microsoft/ChatBench Preview • Updated Apr 28, 2025 • 255 • 13 microsoft/chatbench-distilgpt2 Text Generation • 81.9M • Updated Aug 23, 2025 • 41 • 4 microsoft/chatbench-llama3-8b Updated Aug 23, 2025 • 17 • 6 microsoft/chatbench-mistral-7b Updated Aug 23, 2025 • 20 • 5
VibeVoice Frontier Text-to-Speech Models https://microsoft.github.io/VibeVoice/ microsoft/VibeVoice-1.5B Text-to-Speech • 3B • Updated Jan 22 • 147k • 2.34k microsoft/VibeVoice-Realtime-0.5B Text-to-Speech • 1B • Updated Dec 12, 2025 • 1.2M • 1.2k VibeVoice Technical Report Paper • 2508.19205 • Published Aug 26, 2025 • 165 microsoft/VibeVoice-ASR Automatic Speech Recognition • 9B • Updated Jan 27 • 734k • 1.05k
MediPhi A collection of SLMs based on Phi3.5-mini-instruct adapted to clinical natural language processing tasks: https://arxiv.org/abs/2505.10717 A Modular Approach for Clinical SLMs Driven by Synthetic Data with Pre-Instruction Tuning, Model Merging, and Clinical-Tasks Alignment Paper • 2505.10717 • Published May 15, 2025 • 5 microsoft/MediPhi-Instruct Text Generation • 4B • Updated Dec 15, 2025 • 2.96k • 66 microsoft/MediPhi Text Generation • 4B • Updated Dec 15, 2025 • 673 • 21 microsoft/MediPhi-PubMed Text Generation • 4B • Updated Dec 15, 2025 • 208 • 11
A Modular Approach for Clinical SLMs Driven by Synthetic Data with Pre-Instruction Tuning, Model Merging, and Clinical-Tasks Alignment Paper • 2505.10717 • Published May 15, 2025 • 5
Dayhoff Atlas The models and datasets that comprise the Dayhoff Atlas microsoft/Dayhoff Viewer • Updated 21 days ago • 1.77B • 2.71k • 11 microsoft/Dayhoff-170m-UR50 Text Generation • 0.2B • Updated Jan 16 • 95 • 5 microsoft/Dayhoff-170m-UR90 Text Generation • 0.2B • Updated Jan 26 • 972 • 1 microsoft/Dayhoff-170m-GR Text Generation • 0.2B • Updated Jan 26 • 2.21k • 2
NatureLM microsoft/NatureLM-8x7B 47B • Updated Jun 20, 2025 • 34 • 20 microsoft/NatureLM-8x7B-Inst 47B • Updated Jun 20, 2025 • 79 • 25
Paza Paza is a collection of speech models & benchmarks for low resource languages by the Microsoft Research Africa - Nairobi Lab Running Agents 15 PazaBench 🥇 15 ASR Leaderboard for low resource languages microsoft/paza-Phi-4-multimodal-instruct Automatic Speech Recognition • 6B • Updated Feb 4 • 123 • 3 microsoft/paza-whisper-large-v3-turbo Automatic Speech Recognition • 0.8B • Updated Feb 4 • 356 • 6
NextCoder NextCoder family of code-editing LMs developed with Selective Knowledge Transfer and its training data. microsoft/NextCoder-7B Text Generation • 8B • Updated Jun 12, 2025 • 473 • 32 microsoft/NextCoder-14B Text Generation • 15B • Updated Jun 12, 2025 • 355 • 18 microsoft/NextCoder-32B Text Generation • 33B • Updated Jun 12, 2025 • 141 • • 67 microsoft/NextCoderDataset Viewer • Updated Jul 8, 2025 • 381k • 542 • 55
Phi-4 Phi-4 family of small language, multi-modal and reasoning models. microsoft/Phi-4-mini-flash-reasoning Text Generation • Updated Dec 10, 2025 • 866 • 275 microsoft/Phi-4-mini-reasoning Text Generation • Updated Dec 10, 2025 • 34.6k • 225 microsoft/Phi-4-reasoning Text Generation • Updated Nov 24, 2025 • 14.1k • 222 microsoft/Phi-4-reasoning-plus Text Generation • Updated Nov 24, 2025 • 21.9k • 338
Phi-3 Phi-3 family of small language and multi-modal models. Language models are available in short- and long-context lengths. microsoft/Phi-3.5-mini-instruct Text Generation • 4B • Updated Dec 10, 2025 • 735k • 973 microsoft/Phi-3.5-MoE-instruct Text Generation • Updated Dec 10, 2025 • 105k • 574 microsoft/Phi-3.5-vision-instruct Image-Text-to-Text • Updated Dec 10, 2025 • 1.52M • 731 microsoft/Phi-3-mini-4k-instruct Text Generation • 4B • Updated Dec 10, 2025 • 736k • 1.42k
Phi-1 Phi-1 family of small language models. microsoft/phi-1 Text Generation • 1B • Updated Nov 24, 2025 • 7.38k • 220 microsoft/phi-1_5 Text Generation • 1B • Updated Nov 24, 2025 • 82.6k • 1.36k Textbooks Are All You Need Paper • 2306.11644 • Published Jun 20, 2023 • 155 Textbooks Are All You Need II: phi-1.5 technical report Paper • 2309.05463 • Published Sep 11, 2023 • 91
Textbooks Are All You Need II: phi-1.5 technical report Paper • 2309.05463 • Published Sep 11, 2023 • 91
Controllable Safety Alignment Artifacts for the paper "Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements" (https://arxiv.org/abs/2410.08968) Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements Paper • 2410.08968 • Published Oct 11, 2024 • 14 microsoft/CoSApien Viewer • Updated Aug 1, 2025 • 200 • 169 • 3 microsoft/CoSAlign-Test Viewer • Updated May 5, 2025 • 3.2k • 170 • 3 microsoft/CoSAlign-Train Viewer • Updated Aug 1, 2025 • 125k • 84 • 4
Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements Paper • 2410.08968 • Published Oct 11, 2024 • 14
BitNet 🔥BitNet family of large language models (1-bit LLMs). microsoft/bitnet-b1.58-2B-4T Text Generation • 0.8B • Updated Dec 17, 2025 • 15.5k • 1.44k microsoft/bitnet-b1.58-2B-4T-bf16 Text Generation • 2B • Updated Dec 17, 2025 • 6.37k • 41 microsoft/bitnet-b1.58-2B-4T-gguf Text Generation • 2B • Updated Dec 17, 2025 • 69.3k • 267 BitNet b1.58 2B4T Technical Report Paper • 2504.12285 • Published Apr 16, 2025 • 85
MAI-DS-R1 MAI-DS-R1 is a DeepSeek-R1 reasoning model that has been post-trained by the Microsoft AI team. microsoft/MAI-DS-R1 Text Generation • Updated Dec 15, 2025 • 116 • 293 microsoft/MAI-DS-R1-FP8 Text Generation • 671B • Updated Dec 15, 2025 • 179 • 26
LLM2CLIP LLM2CLIP makes SOTA pretrained CLIP modal more SOTA ever. microsoft/LLM2CLIP-EVA02-L-14-336 Zero-Shot Image Classification • Updated Nov 22, 2024 • 83 • 61 microsoft/LLM2CLIP-Openai-L-14-336 Zero-Shot Classification • 0.6B • Updated Nov 24, 2024 • 3.85k • 44 microsoft/LLM2CLIP-EVA02-B-16 Updated Feb 8, 2025 • 42 • 11 microsoft/LLM2CLIP-Openai-B-16 Zero-Shot Classification • 0.4B • Updated Nov 24, 2024 • 113 • 19
microsoft/LLM2CLIP-Openai-L-14-336 Zero-Shot Classification • 0.6B • Updated Nov 24, 2024 • 3.85k • 44
SpeechT5 The SpeechT5 framework consists of a shared seq2seq and six modal-specific (speech/text) pre/post-nets that can address a few audio-related tasks. SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing Paper • 2110.07205 • Published Oct 14, 2021 • 6 microsoft/speecht5_tts Text-to-Speech • Updated Nov 8, 2023 • 223k • 826 Runtime error Agents Featured 220 SpeechT5 Speech Synthesis Demo 👩 220 microsoft/speecht5_vc Audio-to-Audio • Updated Mar 22, 2023 • 2.09k • 111
SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing Paper • 2110.07205 • Published Oct 14, 2021 • 6
TAPEX TAPEX is the state-of-the-art table pre-training models which can be used for table-based question answering and table-based fact verification. TAPEX: Table Pre-training via Learning a Neural SQL Executor Paper • 2107.07653 • Published Jul 16, 2021 • 3 microsoft/tapex-large-finetuned-wtq Table Question Answering • 0.4B • Updated Jan 12, 2024 • 779 • 78 microsoft/tapex-base-finetuned-wikisql Table Question Answering • Updated Jan 24, 2023 • 899k • • 24 microsoft/tapex-large-sql-execution Table Question Answering • 0.4B • Updated Sep 15, 2023 • 66 • 18
TAPEX: Table Pre-training via Learning a Neural SQL Executor Paper • 2107.07653 • Published Jul 16, 2021 • 3
microsoft/tapex-large-finetuned-wtq Table Question Answering • 0.4B • Updated Jan 12, 2024 • 779 • 78
Table Transformer The Table Transformer (TATR) is a series of object detection models useful for table extraction from PDF images. microsoft/table-transformer-detection Object Detection • 28.8M • Updated Sep 6, 2023 • 3.21M • 415 microsoft/table-transformer-structure-recognition Object Detection • 28.8M • Updated Sep 6, 2023 • 1.27M • 213 microsoft/table-transformer-structure-recognition-v1.1-all Object Detection • 28.8M • Updated Nov 18, 2023 • 938k • 82 microsoft/table-transformer-structure-recognition-v1.1-fin Object Detection • 28.8M • Updated Nov 27, 2023 • 372 • 2
microsoft/table-transformer-structure-recognition Object Detection • 28.8M • Updated Sep 6, 2023 • 1.27M • 213
microsoft/table-transformer-structure-recognition-v1.1-all Object Detection • 28.8M • Updated Nov 18, 2023 • 938k • 82
microsoft/table-transformer-structure-recognition-v1.1-fin Object Detection • 28.8M • Updated Nov 27, 2023 • 372 • 2
LayoutLM The LayoutLM series are Transformer encoders useful for document AI tasks such as invoice parsing, document image classification and DocVQA. microsoft/layoutlmv3-base 0.1B • Updated Apr 10, 2024 • 564k • 481 microsoft/layoutlmv2-base-uncased Updated Sep 16, 2022 • 602k • 67 microsoft/layoutlm-base-uncased 0.1B • Updated Apr 16, 2024 • 175k • 62 microsoft/layoutxlm-base Updated Sep 16, 2022 • 7.16k • 74
Biomedical Models for biomedical research applications, such as radiology report generation and biomedical language understanding. microsoft/maira-2 Text Generation • 7B • Updated Aug 14, 2025 • 2.62k • 72 microsoft/rad-dino-maira-2 Image Feature Extraction • 86.6M • Updated Aug 22, 2024 • 23.3k • 24 microsoft/rad-dino Image Feature Extraction • 86.6M • Updated Oct 9, 2025 • 28.9k • 72 microsoft/radedit Updated Dec 8, 2025 • 30
Orca The Orca family of LMs developed by Microsoft. microsoft/Orca-2-7b Text Generation • Updated Nov 22, 2023 • 2.22k • 224 microsoft/Orca-2-13b Text Generation • Updated Nov 22, 2023 • 3.13k • 667
UDOP UDOP is a general multimodal model for document AI Unifying Vision, Text, and Layout for Universal Document Processing Paper • 2212.02623 • Published Dec 5, 2022 • 12 microsoft/udop-large Image-Text-to-Text • 0.7B • Updated Dec 2, 2025 • 80.7k • 124 microsoft/udop-large-512 Image-Text-to-Text • 0.7B • Updated Dec 2, 2025 • 65 • 6 microsoft/udop-large-512-300k Image-Text-to-Text • 0.7B • Updated Dec 2, 2025 • 232 • 34
Unifying Vision, Text, and Layout for Universal Document Processing Paper • 2212.02623 • Published Dec 5, 2022 • 12
GIT GIT (Generative Image-to-text Transformer) is a model useful for vision-language tasks such as image/video captioning and question answering. GIT: A Generative Image-to-text Transformer for Vision and Language Paper • 2205.14100 • Published May 27, 2022 • 2 microsoft/git-base Image-to-Text • 0.2B • Updated Apr 24, 2023 • 19.5k • 110 microsoft/git-large Image-to-Text • Updated Feb 8, 2023 • 567 • 18 microsoft/git-base-vqav2 Visual Question Answering • 0.2B • Updated Mar 9, 2024 • 254 • 21
GIT: A Generative Image-to-text Transformer for Vision and Language Paper • 2205.14100 • Published May 27, 2022 • 2
Florence Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks Paper • 2311.06242 • Published Nov 10, 2023 • 96 microsoft/Florence-2-large Image-Text-to-Text • 0.8B • Updated Aug 4, 2025 • 1.09M • 1.8k microsoft/Florence-2-base Image-Text-to-Text • 0.2B • Updated Aug 4, 2025 • 551k • 364 microsoft/Florence-2-large-ft Image-Text-to-Text • 0.8B • Updated Aug 4, 2025 • 34.1k • 384
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks Paper • 2311.06242 • Published Nov 10, 2023 • 96
IFMs Industrial Foundation Models microsoft/LLaMA-2-7b-GTL-Delta Text Generation • 7B • Updated Aug 12, 2024 • 62 • 10 microsoft/LLaMA-2-13b-GTL-Delta Text Generation • 13B • Updated Aug 12, 2024 • 43 • 6
MoCapAct Locomotion policies for hundreds of simulated humanoid locomotion clips and demonstration data for training them. microsoft/mocapact-models Updated Aug 17, 2024 • 10 microsoft/mocapact-data Updated Aug 17, 2024 • 115 • 5 MoCapAct: A Multi-Task Dataset for Simulated Humanoid Control Paper • 2208.07363 • Published Aug 15, 2022 • 2
MoCapAct: A Multi-Task Dataset for Simulated Humanoid Control Paper • 2208.07363 • Published Aug 15, 2022 • 2