Bash script which tags images with AI
Today I would like to show you how to use a bash script and AI to generate suitable tags for image files and write them directly to the files.
Requirements
Ollama
In order to better adapt the script to my own needs, I decided to install Ollama.
This is a generic wrapper for many LLM models that can be run locally, such as Meta’s Llama 3 or LLaVA.
jq
jq is a tool that can query and extract data from JSON strings. We need it to handle the response from the Ollama API.
ExifTool
To update keywords in media files, we will use ExifTool, which usually runs in the command line.
Script
First step: Dry run mode
The Dry run mode is a pattern that allows to decide, if data should be written or changed e.g.
This can help to decide, if operations whould be made as expected.
For this we can check for a -n
or --dry-run
flag:
#!/bin/bash
# dry run mode
dry_run=0
if [[ "$1" == "-n" || "$1" == "--dry-run" ]]; then
dry_run=1
echo -e "\tℹ️ Dry run mode enabled. No changes will be made."
shift # remove the dry run argument
fi
if [ -z "$1" ]; then
# invalid number of aguments
echo "Usage: $0 <glob-pattern>"
exit 1
fi
# ...
Custom settings
To make it possible to customize the execution, maybe by defining another model or prompt, you can use the following environment variables:
# ...
# if you run the API on another address, you
# can customize by using `TGF_API_URL` environment variable
url="${TGF_API_URL:-http://localhost:11434/api/generate}"
# you can define a custom prompt
# with `TGF_PROMPT` environment variable
prompt="${TGF_PROMPT:-What is in this picture? Answer with 10 keywords as a JSON array with format [\"keyword1\",\"keyword2\",\"keyword3\",\"keyword4\",\"keyword5\",\"keyword6\",\"keyword7\",\"keyword8\",\"keyword9\",\"keyword10\"].}"
# if you prefer another model, you
# can customize by using `TGF_LLM_MODEL` environment variable
model="${TGF_LLM_MODEL:-llava}"
# if you prefer another temperature value, you
# can customize by using `TGF_LLM_TEMPERATURE` environment variable
temperature="${TGF_LLM_TEMPERATURE:-0}"
# ...
Special bash settings
By default, glob pattern which return an empty list, will output a warning.
To avoid this, we have to set nullglob
with shopt:
# ...
# temporary activation of special options
shopt -s nullglob
# ...
# now deactivate temporary options
shopt -u nullglob
Prepare and do the request
By default Ollama will provide an endpoint to generate a completion at http://localhost:11434/api/generate
Before we send the request with wget, which should be installed on nearly all Linux/UNIX systems, we prepare the POST data with jq
.
The image data will be send as Base64 string:
# ...
for pattern in "$@"; do # iterate of each pattern from command line
for file in $pattern; do # iterate over each existing file represented by the pattern
full_file_path=$(realpath "$file")
echo "Processing $full_file_path ..."
# read the current file and save as Base64
base64string=$(base64 -i "$full_file_path")
# create a JSON for the Ollama API
json_data=$(jq -n \
--arg model "$model" \
--arg prompt "$prompt" \
--arg img "$base64string" \
--argjson temperature "$temperature" \
'{model: $model, prompt: $prompt, options: {temperature: $temperature}, stream: false, images: [$img]}')
# execute Ollama API
echo -e "\tℹ️ Sending POST request..."
response=$(wget --header "Content-Type: application/json" \
--post-data "$json_data" \
"$url" \
-q \
-O - 2>&1) # Captures both the output and server response headers
# ...
done
done
# ...
Extract keywords
If the answer is successful, the returned JSON will contain a response
property with the text from the LLM.
Unfortunately, models like LLaVA deliver markdown like this one:
```json
["mechanic", "garage", "workshop", "toolbox", "vehicle", "engine", "tools", "overalls", "smiling", "professional"]
```
Therefore, we have to extract the array from the text before we can send it to jq
:
# ...
for pattern in "$@"; do
for file in $pattern; do
# ...
# if the execution was successful, there should
# be a `response` property in the output
extracted_response=$(echo "$response" | jq -r '.response')
if [ -n "$extracted_response" ]; then
# we have a `response` property, but it can happen
# that, especially
array_with_keywords=$(echo "$extracted_response" | sed -n '/```json/,/```/p' | sed -e '1d' -e '$d' | sed -e 's/^[[:space:]]*//g' -e 's/[[:space:]]*$//g' -e 's/[[:space:]]*```+.*$//')
# check if we have a valid JSON array in `$extracted_response`
if ! echo "$array_with_keywords" | jq -e . >/dev/null 2>&1; then
echo -e "\t⚠️ Could not find JSON array in response!"
continue
fi
# convert JSON string to sorted list of keywords
sorted_keywords=$(echo "$array_with_keywords" | jq -r '.[]' | sort | uniq | paste -sd ',' -)
# do not continue if we have no keywords from `jq`
if [ -z "$sorted_keywords" ]; then
echo -e "\t⚠️ No keywords found for '$full_file_path'!"
continue
fi
# ...
else
echo -e "\t⚠️ No valid JSON found!"
fi
# ...
done
done
# ...
Update keywords
Last but not least, we can use exiftool
to update the list of keywords, which are currently stored in sorted_keywords
:
# ...
for pattern in "$@"; do
for file in $pattern; do
# ...
if [ -n "$extracted_response" ]; then
# ...
# the next steps will write to file, so
# check if dry mode is enabled
if [[ $dry_run -eq 1 ]]; then
echo -e "\tℹ️ Found following keywords for '$full_file_path': $sorted_keywords"
continue
fi
# first try remove old keywords
if ! exiftool -keywords= "$full_file_path" -overwrite_original >/dev/null 2>&1; then
echo -e "\t⚠️ Failed to remove old keywords from '$full_file_path'."
continue
fi
# now write the final keywords
if exiftool -keywords="$keywords" "$full_file_path" -overwrite_original >/dev/null 2>&1; then
echo -e "\t✅ Wrote following keywords to '$full_file_path': $sorted_keywords"
else
echo -e "\t⚠️ Failed to update keywords for '$full_file_path'."
fi
else
# ...
fi
done
done
# ...
Conclusion
Ollama is a very cool and nice alternative to ChatGPT API and lets you simply run LLMs locally with a generic API in Dockerfile style.
This kind of script could later run on a NAS that supports or allows scripts.
The final script can be found on GitHub as Gist.
Have fun while trying it out! 🎉