Introducing 2.5 Flash-Lite, a thinking model for those looking for low cost and latency.
Upgrade to Gemini 2.5
2.5 Flash-Lite excels at high-volume, latency-sensitive tasks like translation and classification.
- 
        
Thinking, enabled Experience improved reasoning and output quality with thinking mode and thinking budgets. 
- 
        
Superior latency Benefit from faster response times. 
- 
        
Tool use Utilize key Gemini 2.5 features including tool uses like Search and code execution. 
- 
        
Cost-efficient 2.5 Flash-Lite is our most cost-efficient 2.5 model yet. 
Hands-on with 2.5 Flash-Lite
Benchmarks
2.5 Flash-Lite has all-round, significantly higher performance than 2.0 Flash-Lite on coding, math, science, reasoning and multimodal benchmarks.
| Benchmark | 
							Gemini 2.0 Flash | 
							Gemini 2.5 Flash-Lite Non-thinking | 
							Gemini 2.5 Flash-Lite Thinking | |
|---|---|---|---|---|
| 
							
								Reasoning & knowledge
							
							
								Humanity's Last Exam (no tools)
							
						 | 5.1%* | 5.1% | 6.9% | |
| 
							
								Science
							
							
								GPQA diamond
							
						 | 65.2% | 64.6% | 66.7% | |
| 
							
								Mathematics
							
							
								AIME 2025
							
						 | 29.7% | 49.8% | 63.1% | |
| 
							
								Code generation
							
							
								LiveCodeBench
							
							
								(UI: 1/1/2025-5/1/2025)
							
						 | 29.1% | 33.7% | 34.3% | |
| 
							
								Code editing
							
							
								Aider Polyglot
							
						 | 21.3% | 26.7% | 27.1% | |
| 
							
								Agentic coding
							
							
								SWE-bench Verified
							
						 | single attempt | 21.4% | 31.6% | 27.6% | 
|  | multiple attempts | 34.2% | 42.6% | 44.9% | 
| 
							
								Factuality
							
							
								SimpleQA
							
						 | 29.9% | 10.7% | 13.0% | |
| 
							
								Factuality
							
							
								FACTS grounding
							
						 | 84.6% | 84.1% | 86.8% | |
| 
							
								Visual reasoning
							
							
								MMMU
							
						 | 69.3% | 72.9% | 72.9% | |
| 
							
								Image understanding
							
							
								Vibe-Eval (Reka)
							
						 | 55.4% | 51.3% | 57.5% | |
| 
							
								Long context
							
							
								MRCR v2 (8-needle)
							
						 | 128k (average) | 19.0% | 16.6% | 30.6% | 
|  | 1M (pointwise) | 5.3% | 4.1% | 5.4% | 
| 
							
								Multilingual performance
							
							
								Global MMLU (Lite)
							
						 | 83.4% | 81.1% | 84.5% | 
Model information
| 2.5 Flash-Lite | |
| Model deployment status | General availability | 
| Supported data types for input | Text, Image, Video, Audio, PDF | 
| Supported data types for output | Text | 
| Supported # tokens for input | 1M | 
| Supported # tokens for output | 64k | 
| Knowledge cutoff | January 2025 | 
| Tool use | Search as a tool Code execution | 
| Best for | High volume, low-cost and low latency tasks | 
| Availability | Google AI Studio Gemini API Vertex AI |