GPT-4 Vision for Home Assistant

Image Analyzer for Home Assistant using GPT Vision

🌟 Features · 📖 Resources · ⬇️ Installation · ▶️ Usage · 🧠 Model Overview · 🪲 How to report Bugs

gpt4vision is a Home Assistant integration that allows you to analyze images and camera feeds using GPT-4 Vision.
Supported providers are OpenAI, Anthropic, Google Gemini, LocalAI and Ollama.

Features

Compatible with OpenAI, Anthropic Claude, Google Gemini, LocalAI and Ollama
Takes images and camera entities as input as well as image files
Images can be downscaled for faster processing
Can be installed and updated through HACS and can be set up in the Home Assistant UI

Resources

Check the 📖 wiki for examples on how you can integrate gpt4vision into your Home Assistant setup or join the 🗨️ discussion in the Home Assistant Community.

Installation

Installation via HACS (recommended)

Search for GPT-4 Vision in Home Assistant Settings/Devices & services
Select your provider
Follow the instructions to complete setup

Manual Installation

Download and copy the gpt4vision folder into your custom_components folder.
Add integration in Home Assistant Settings/Devices & services
Provide your API key or IP address and port of your self-hosted server

Provider specific setup

OpenAI

Simply obtain an API key from OpenAI and enter it in the Home Assistant UI during setup.
A pricing calculator is available here: https://openai.com/api/pricing/.

Anthropic

Obtain an API key from Anthropic and enter it in the Home Assistant UI during setup. Pricing is available here: Anthropic image cost. Images can be downscaled with the built-in downscaler.

Google

To use Google Gemini you need to have a Google account and obtain an API key from the AI Studio. Depending on your region, you may need to enable billing. Pricing is available here: Gemini Pricing

LocalAI

To use LocalAI you need to have a LocalAI server running. You can find the installation instructions here. During setup you'll need to provide the IP address of your machine and the port on which LocalAI is running (default is 8000).

Ollama

To use Ollama you need to have an Ollama server running. You can download it from here. Once installed you need to run the following command to download the llava model:

ollama run llava

If your Home Assistant is not running on the same computer as Ollama, you need to set the OLLAMA_HOST environment variable.

On Linux:

Edit the systemd service by calling systemctl edit ollama.service. This will open an editor.
For each environment variable, add a line Environment under section [Service]:

[Service]
Environment="OLLAMA_HOST=0.0.0.0"

Save and close the editor.
Reload systemd and restart Ollama

systemctl daemon-reload
systemctl restart ollama

On Windows:

Quit Ollama from the system tray
Open File Explorer
Right click on This PC and select Properties
Click on Advanced system settings
Select Environment Variables
Under User variables click New
For variable name enter OLLAMA_HOST and for value enter 0.0.0.0
Click OK and start Ollama again from the Start Menu

On macOS:

Open Terminal
Run the following command

launchctl setenv OLLAMA_HOST "0.0.0.0"

Restart Ollama

Usage

After restarting, the gpt4vision.image_analyzer service will be available. You can test it in the developer tools section in home assistant. To get OpenAI gpt-4o's analysis of a local image, use the following service call.

service: gpt4vision.image_analyzer
data:
  provider: OpenAI
  message: Describe what you see?
  max_tokens: 100
  model: gpt-4o
  image_file: |-
    /config/www/tmp/example.jpg
    /config/www/tmp/example2.jpg
  image_entity:
    - camera.garage
    - image.front_door_person
  target_width: 1280
  detail: low
  temperature: 0.5
  include_filename: true

Parameter	Optional	Description	Default	Valid Values
`provider`	No	The AI provider call.	`OpenAI`	`OpenAI`, `Anthropic`, `Google`, `Ollama`, `LocalAI`
`model`	Yes	Model used for processing the image(s).		See table below
`message`	No	The prompt to send along with the image(s).		String
`image_file`	Yes*	The path to the image file(s). Each path must be on a new line.		Valid path to an image file
`image_entity`	Yes*	An alternative to `image_file` for providing image input.		any `image` or `camera` entity
`include_filename`	Yes	Whether to include the filename in the request.	`false`	`true`, `false`
`target_width`	Yes	Width to downscale the image to before encoding.	1280	Integer between 512 and 3840
`detail`	Yes	Level of detail to use for image understanding.	`auto`	`auto`, `low`, `high`
`max_tokens`	No	The maximum number of response tokens to generate.	100	Integer between 10 and 1000
`temperature`	No	Randomness of the output.	0.5	Float between 0.0 and 1.0

Additional information

Note

If you set include_filename to false (the default) requests will look roughly like the following: Images will be numbered sequentially starting from 1. You can refer to the images by their number in the prompt.

Image 1:
<base64 encoded image>
Image 2:
<base64 encoded image>
...
<Your prompt>

Note

If you set include_filename to true requests will look roughly like the following

If the input is an image entity, the filename will be the entity's friendly_name attribute.
If the input is an image file, the filename will be the file's name without the extension.
Your prompt will be appended to the end of the request.

Front Door:
<base64 encoded image>
front_door_2024-12-31_23:59:59:
<base64 encoded image>
...
<Your prompt>

Model Overview

Model Name	Hosting Options	Description	MMMU¹ Score
GPT-4o	Cloud (OpenAI API key required)	Best all-round model	69.1
Claude 3.5 Sonnet	Cloud (Anthropic API key required)	Balance between performance and speed	68.3
Claude 3 Haiku	Cloud (Anthropic API key required)	Fast model optimized for speed	50.2
Claude 3 Sonnet	Cloud (Anthropic API key required)	Balance between performance and speed	53.1
Claude 3 Opus	Cloud (Anthropic API key required)	High-performance model for more accuracy	59.4
Gemini 1.5 Flash	Cloud (Google API key required)	Fast model optimized for speed	56.1
Gemini 1.5 Pro	Cloud (Google API key required)	High-performance model for more accuracy	62.2
LLaVA-1.6	Self-hosted (LocalAI or Ollama)	Open-Source alternative	43.8

Data is based on the MMMU Leaderboard²

Choosing the right model for you

Note

Claude 3.5 Sonnet achieves strong performance - comparable to GPT-4o - in the Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark MMMU¹, while being 40% less expensive. This makes it the go-to model for most use cases.

gpt4vision is compatible with multiple providers, each of which has different models available. Some providers run in the cloud, while others are self-hosted.
To see which model is best for your use case, check the figure below. It visualizes the averaged MMMU¹ scores of available cloud-based models. The higher the score, the better the model performs.

Benchmark will be updated regularly to include new models.

¹ MMMU stands for "Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark". It assesses multimodal capabilities including image understanding.
² The data is based on the MMMU Leaderboard

Debugging

To enable debugging, add the following to your configuration.yaml:

logger:
  logs:
    custom_components.gpt4vision: debug

How to report a bug or request a feature

Important

Bugs: If you encounter any bugs and have followed the instructions carefully, feel free to file a bug report.
Feature Requests: If you have an idea for a feature, create a feature request.

Create new Issue

Name		Name	Last commit message	Last commit date
Latest commit History 202 Commits
.github		.github
benchmark_visualization		benchmark_visualization
custom_components/gpt4vision		custom_components/gpt4vision
.DS_Store		.DS_Store
LICENSE		LICENSE
README.md		README.md
hacs.json		hacs.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GPT-4 Vision for Home Assistant

Features

Resources

Installation

Installation via HACS (recommended)

Manual Installation

Provider specific setup

OpenAI

Anthropic

Google

LocalAI

Ollama

Usage

Additional information

Model Overview

Choosing the right model for you

Debugging

How to report a bug or request a feature

About

Releases 18

Contributors 2

Languages

License

valentinfrlch/ha-gpt4vision

Folders and files

Latest commit

History

Repository files navigation

GPT-4 Vision for Home Assistant

Features

Resources

Installation

Installation via HACS (recommended)

Manual Installation

Provider specific setup

OpenAI

Anthropic

Google

LocalAI

Ollama

Usage

Additional information

Model Overview

Choosing the right model for you

Debugging

How to report a bug or request a feature

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 18

Contributors 2

Languages