Bash script which tags images with AI // The GitFather

Today I would like to show you how to use a bash script and AI to generate suitable tags for image files and write them directly to the files.

Requirements

Ollama

In order to better adapt the script to my own needs, I decided to install Ollama.

This is a generic wrapper for many LLM models that can be run locally, such as Meta’s Llama 3 or LLaVA.

jq

jq is a tool that can query and extract data from JSON strings. We need it to handle the response from the Ollama API.

ExifTool

To update keywords in media files, we will use ExifTool, which usually runs in the command line.

Script

First step: Dry run mode

The Dry run mode is a pattern that allows to decide, if data should be written or changed e.g.

This can help to decide, if operations whould be made as expected.

For this we can check for a -n or --dry-run flag:

#!/bin/bash

# dry run mode
dry_run=0
if [[ "$1" == "-n" || "$1" == "--dry-run" ]]; then
    dry_run=1
    echo -e "\tℹ️ Dry run mode enabled. No changes will be made."
    shift  # remove the dry run argument
fi

if [ -z "$1" ]; then
    # invalid number of aguments
    echo "Usage: $0 <glob-pattern>"
    exit 1
fi

# ...

Custom settings

To make it possible to customize the execution, maybe by defining another model or prompt, you can use the following environment variables:

# ...

# if you run the API on another address, you
# can customize by using `TGF_API_URL` environment variable
url="${TGF_API_URL:-http://localhost:11434/api/generate}"

# you can define a custom prompt
# with `TGF_PROMPT` environment variable
prompt="${TGF_PROMPT:-What is in this picture? Answer with 10 keywords as a JSON array with format [\"keyword1\",\"keyword2\",\"keyword3\",\"keyword4\",\"keyword5\",\"keyword6\",\"keyword7\",\"keyword8\",\"keyword9\",\"keyword10\"].}"

# if you prefer another model, you
# can customize by using `TGF_LLM_MODEL` environment variable
model="${TGF_LLM_MODEL:-llava}"

# if you prefer another temperature value, you
# can customize by using `TGF_LLM_TEMPERATURE` environment variable
temperature="${TGF_LLM_TEMPERATURE:-0}"

# ...

Special bash settings

By default, glob pattern which return an empty list, will output a warning.

To avoid this, we have to set nullglob with shopt:

# ...

# temporary activation of special options
shopt -s nullglob

# ...

# now deactivate temporary options
shopt -u nullglob

Prepare and do the request

By default Ollama will provide an endpoint to generate a completion at http://localhost:11434/api/generate

Before we send the request with wget, which should be installed on nearly all Linux/UNIX systems, we prepare the POST data with jq.

The image data will be send as Base64 string:

# ...

for pattern in "$@"; do  # iterate of each pattern from command line
    for file in $pattern; do  # iterate over each existing file represented by the pattern
        full_file_path=$(realpath "$file")

        echo "Processing $full_file_path ..."
        
        # read the current file and save as Base64
        base64string=$(base64 -i "$full_file_path")

        # create a JSON for the Ollama API
        json_data=$(jq -n \
                        --arg model "$model" \
                        --arg prompt "$prompt" \
                        --arg img "$base64string" \
                        --argjson temperature "$temperature" \
                        '{model: $model, prompt: $prompt, options: {temperature: $temperature}, stream: false, images: [$img]}')

        # execute Ollama API
        echo -e "\tℹ️ Sending POST request..."
        response=$(wget --header "Content-Type: application/json" \
                        --post-data "$json_data" \
                        "$url" \
                        -q \
                        -O - 2>&1) # Captures both the output and server response headers

        # ...
    done
done

# ...

Extract keywords

If the answer is successful, the returned JSON will contain a response property with the text from the LLM.

Unfortunately, models like LLaVA deliver markdown like this one:

  ```json
    ["mechanic", "garage", "workshop", "toolbox", "vehicle", "engine", "tools", "overalls", "smiling", "professional"]
    ```

Therefore, we have to extract the array from the text before we can send it to jq:

# ...

for pattern in "$@"; do
    for file in $pattern; do
        # ...

        # if the execution was successful, there should
        # be a `response` property in the output
        extracted_response=$(echo "$response" | jq -r '.response')

        if [ -n "$extracted_response" ]; then
            # we have a `response` property, but it can happen
            # that, especially 
            array_with_keywords=$(echo "$extracted_response" | sed -n '/```json/,/```/p' | sed -e '1d' -e '$d' | sed -e 's/^[[:space:]]*//g' -e 's/[[:space:]]*$//g' -e 's/[[:space:]]*```+.*$//')

            # check if we have a valid JSON array in `$extracted_response`
            if ! echo "$array_with_keywords" | jq -e . >/dev/null 2>&1; then
                echo -e "\t⚠️ Could not find JSON array in response!"
                continue
            fi

            # convert JSON string to sorted list of keywords
            sorted_keywords=$(echo "$array_with_keywords" | jq -r '.[]' | sort | uniq | paste -sd ',' -)

            # do not continue if we have no keywords from `jq`
            if [ -z "$sorted_keywords" ]; then
                echo -e "\t⚠️ No keywords found for '$full_file_path'!"
                continue
            fi

            # ...
        else
            echo -e "\t⚠️ No valid JSON found!"
        fi

        # ...
    done
done

# ...

Update keywords

Last but not least, we can use exiftool to update the list of keywords, which are currently stored in sorted_keywords:

# ...

for pattern in "$@"; do
    for file in $pattern; do
        # ...

        if [ -n "$extracted_response" ]; then
            # ...

            # the next steps will write to file, so
            # check if dry mode is enabled
            if [[ $dry_run -eq 1 ]]; then
                echo -e "\tℹ️ Found following keywords for '$full_file_path': $sorted_keywords"
                continue
            fi

            # first try remove old keywords
            if ! exiftool -keywords= "$full_file_path" -overwrite_original >/dev/null 2>&1; then
                echo -e "\t⚠️ Failed to remove old keywords from '$full_file_path'."
                continue
            fi

            # now write the final keywords
            if exiftool -keywords="$keywords" "$full_file_path" -overwrite_original >/dev/null 2>&1; then
                echo -e "\t✅ Wrote following keywords to '$full_file_path': $sorted_keywords"
            else
                echo -e "\t⚠️ Failed to update keywords for '$full_file_path'."
            fi
        else
            # ...
        fi
    done
done

# ...

Conclusion

Ollama is a very cool and nice alternative to ChatGPT API and lets you simply run LLMs locally with a generic API in Dockerfile style.

This kind of script could later run on a NAS that supports or allows scripts.

The final script can be found on GitHub as Gist.

Have fun while trying it out! 🎉