MiniGPT-v2: large language model as a unified interface for
vision-language multi-task learning
Paper
• 2310.09478
• Published
• 21
Can GPT models be Financial Analysts? An Evaluation of ChatGPT and GPT-4
on mock CFA Exams
Paper
• 2310.08678
• Published
• 13
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper
• 2307.09288
• Published
• 250
LLaMA: Open and Efficient Foundation Language Models
Paper
• 2302.13971
• Published
• 20
FlashAttention: Fast and Memory-Efficient Exact Attention with
IO-Awareness
Paper
• 2205.14135
• Published
• 15
Baichuan 2: Open Large-scale Language Models
Paper
• 2309.10305
• Published
• 22
Paper
• 2309.16609
• Published
• 38
Code Llama: Open Foundation Models for Code
Paper
• 2308.12950
• Published
• 29
Tuna: Instruction Tuning using Feedback from Large Language Models
Paper
• 2310.13385
• Published
• 10
Monolingual or Multilingual Instruction Tuning: Which Makes a Better
Alpaca
Paper
• 2309.08958
• Published
• 2
Reflection-Tuning: Data Recycling Improves LLM Instruction-Tuning
Paper
• 2310.11716
• Published
• 6
AlpaGasus: Training A Better Alpaca with Fewer Data
Paper
• 2307.08701
• Published
• 24
Paper
• 2309.03450
• Published
• 8
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation,
Generation and Editing
Paper
• 2311.00571
• Published
• 43
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper
• 2311.05437
• Published
• 51
u-LLaVA: Unifying Multi-Modal Tasks via Large Language Model
Paper
• 2311.05348
• Published
• 13
To See is to Believe: Prompting GPT-4V for Better Visual Instruction
Tuning
Paper
• 2311.07574
• Published
• 16
Trusted Source Alignment in Large Language Models
Paper
• 2311.06697
• Published
• 12
Gemini vs GPT-4V: A Preliminary Comparison and Combination of
Vision-Language Models Through Qualitative Cases
Paper
• 2312.15011
• Published
• 18
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective
Depth Up-Scaling
Paper
• 2312.15166
• Published
• 61
GPT-4V(ision) is a Generalist Web Agent, if Grounded
Paper
• 2401.01614
• Published
• 22
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision,
Language, Audio, and Action
Paper
• 2312.17172
• Published
• 30
Paper
• 2401.04088
• Published
• 160
MoE-Mamba: Efficient Selective State Space Models with Mixture of
Experts
Paper
• 2401.04081
• Published
• 74
SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents
Paper
• 2401.10935
• Published
• 5
Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual
Perception
Paper
• 2401.16158
• Published
• 20
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
Paper
• 2401.15947
• Published
• 53
How to Train Data-Efficient LLMs
Paper
• 2402.09668
• Published
• 43
Video ReCap: Recursive Captioning of Hour-Long Videos
Paper
• 2402.13250
• Published
• 26