Image, Audio, Video & Document Input
Some LLMs are now capable of understanding audio, video, image and document content.
Image Input
Info
Some models do not support image input. Please check the model's documentation to confirm whether it supports image input.
If you have a direct URL for the image, you can use ImageUrl:
from pydantic_ai import Agent, ImageUrl
agent = Agent(model='gateway/openai:gpt-5')
result = agent.run_sync(
[
'What company is this logo from?',
ImageUrl(url='https://iili.io/3Hs4FMg.png'),
]
)
print(result.output)
#> This is the logo for Pydantic, a data validation and settings management library in Python.
from pydantic_ai import Agent, ImageUrl
agent = Agent(model='openai:gpt-5')
result = agent.run_sync(
[
'What company is this logo from?',
ImageUrl(url='https://iili.io/3Hs4FMg.png'),
]
)
print(result.output)
#> This is the logo for Pydantic, a data validation and settings management library in Python.
If you have the image locally, you can also use BinaryContent:
import httpx
from pydantic_ai import Agent, BinaryContent
image_response = httpx.get('https://iili.io/3Hs4FMg.png') # Pydantic logo
agent = Agent(model='gateway/openai:gpt-5')
result = agent.run_sync(
[
'What company is this logo from?',
BinaryContent(data=image_response.content, media_type='image/png'), # (1)!
]
)
print(result.output)
#> This is the logo for Pydantic, a data validation and settings management library in Python.
- To ensure the example is runnable we download this image from the web, but you can also use
Path().read_bytes()to read a local file's contents.
import httpx
from pydantic_ai import Agent, BinaryContent
image_response = httpx.get('https://iili.io/3Hs4FMg.png') # Pydantic logo
agent = Agent(model='openai:gpt-5')
result = agent.run_sync(
[
'What company is this logo from?',
BinaryContent(data=image_response.content, media_type='image/png'), # (1)!
]
)
print(result.output)
#> This is the logo for Pydantic, a data validation and settings management library in Python.
- To ensure the example is runnable we download this image from the web, but you can also use
Path().read_bytes()to read a local file's contents.
Audio Input
Info
Some models do not support audio input. Please check the model's documentation to confirm whether it supports audio input.
You can provide audio input using either AudioUrl or BinaryContent. The process is analogous to the examples above.
Video Input
Info
Some models do not support video input. Please check the model's documentation to confirm whether it supports video input.
You can provide video input using either VideoUrl or BinaryContent. The process is analogous to the examples above.
Document Input
Info
Some models do not support document input. Please check the model's documentation to confirm whether it supports document input.
You can provide document input using either DocumentUrl or BinaryContent. The process is similar to the examples above.
If you have a direct URL for the document, you can use DocumentUrl:
from pydantic_ai import Agent, DocumentUrl
agent = Agent(model='gateway/anthropic:claude-sonnet-4-5')
result = agent.run_sync(
[
'What is the main content of this document?',
DocumentUrl(url='https://storage.googleapis.com/cloud-samples-data/generative-ai/pdf/2403.05530.pdf'),
]
)
print(result.output)
#> This document is the technical report introducing Gemini 1.5, Google's latest large language model...
from pydantic_ai import Agent, DocumentUrl
agent = Agent(model='anthropic:claude-sonnet-4-5')
result = agent.run_sync(
[
'What is the main content of this document?',
DocumentUrl(url='https://storage.googleapis.com/cloud-samples-data/generative-ai/pdf/2403.05530.pdf'),
]
)
print(result.output)
#> This document is the technical report introducing Gemini 1.5, Google's latest large language model...
The supported document formats vary by model.
You can also use BinaryContent to pass document data directly:
from pathlib import Path
from pydantic_ai import Agent, BinaryContent
pdf_path = Path('document.pdf')
agent = Agent(model='gateway/anthropic:claude-sonnet-4-5')
result = agent.run_sync(
[
'What is the main content of this document?',
BinaryContent(data=pdf_path.read_bytes(), media_type='application/pdf'),
]
)
print(result.output)
#> The document discusses...
from pathlib import Path
from pydantic_ai import Agent, BinaryContent
pdf_path = Path('document.pdf')
agent = Agent(model='anthropic:claude-sonnet-4-5')
result = agent.run_sync(
[
'What is the main content of this document?',
BinaryContent(data=pdf_path.read_bytes(), media_type='application/pdf'),
]
)
print(result.output)
#> The document discusses...
User-side download vs. direct file URL
When using one of ImageUrl, AudioUrl, VideoUrl or DocumentUrl, Pydantic AI will default to sending the URL to the model, so the file is downloaded on their side.
Support for file URLs varies depending on type and provider. Pydantic AI handles this as follows:
| Model | Supported URL types | Sends URL directly |
|---|---|---|
OpenAIChatModel |
ImageUrl, AudioUrl, DocumentUrl |
ImageUrl only |
OpenAIResponsesModel |
ImageUrl, AudioUrl, DocumentUrl |
Yes |
AnthropicModel |
ImageUrl, DocumentUrl |
Yes, except DocumentUrl (text/plain) |
GoogleModel (Vertex) |
All URL types | Yes |
GoogleModel (GLA) |
All URL types | YouTube and Files API URLs only |
MistralModel |
ImageUrl, DocumentUrl (PDF) |
Yes |
BedrockConverseModel |
ImageUrl, DocumentUrl, VideoUrl |
No, defaults to force_download |
A model API may be unable to download a file (e.g., because of crawling or access restrictions) even if it supports file URLs. For example, GoogleModel on Vertex AI limits YouTube video URLs to one URL per request. In such cases, you can instruct Pydantic AI to download the file content locally and send that instead of the URL by setting force_download on the URL object:
from pydantic_ai import ImageUrl, AudioUrl, VideoUrl, DocumentUrl
ImageUrl(url='https://example.com/image.png', force_download=True)
AudioUrl(url='https://example.com/audio.mp3', force_download=True)
VideoUrl(url='https://example.com/video.mp4', force_download=True)
DocumentUrl(url='https://example.com/doc.pdf', force_download=True)
Uploaded Files
Some model providers like Google's Gemini API support uploading files. You can upload a file using the provider's client and passing the resulting URL as input:
from pydantic_ai import Agent, DocumentUrl
from pydantic_ai.models.google import GoogleModel
from pydantic_ai.providers.google import GoogleProvider
provider = GoogleProvider()
file = provider.client.files.upload(file='pydantic-ai-logo.png')
assert file.uri is not None
agent = Agent(GoogleModel('gemini-2.5-flash', provider=provider))
result = agent.run_sync(
[
'What company is this logo from?',
DocumentUrl(url=file.uri, media_type=file.mime_type),
]
)
print(result.output)