Translate text and image files with bash script // The GitFather

I would like to follow up on my post from Wednesday and show how to translate text with a bash script using tools like Tesseract, pdftotext and Ollama.

Requirements

Ollama

Ollama is a generic wrapper for many LLM models that can be run locally, such as Meta’s Llama 3 or Phi-3.

Tesseract

Tesseract is a free OCR tool for the command line that can be installed in nearly all operating systems like Linux, Windows and MacOS.

We need it to extract text from GIF, JPEG or PNG image files.

Get the latest binaries from here or use a package manager like apt or brew/Homebrew to install it.

pdftotext

Also a command line tool that we need to extract text from PDF files.

You can download or install it from here.

Script

The script supports the following arguments:

File path: This is required and defines the source file or stream from where we want to extract text from
Source language: Explicit (but optional) information, especially for Tesseract, about the language of the source, like deu or eng

The translated text will be written to STDOUT, so it can be piped to files e.g.

Functions

I have implemented the following functions for some recurring tasks:

write_line()

To separate the translated text from the console output, we can use STDERR stream:

# ...

# an "echo" that directly writes to STDERR
# so that the translated output can be piped
# from STDOUT
write_line() {
    local text=$1

    echo "$text" >&2
}

# ...

get_text_from_file()

This function helps to decide what tool will be executed for which file:

# ...

# reads text from a files
#
# $1 => path to the file to scan
# $2 => optional information of the language that is used in the file, like "eng" or "deu"
get_text_from_file() {
    local file_path=$1
    local language=$2

    filename=$(basename "$1")
    extension="${filename##*.}"

    case "$extension" in
        pdf)
            # PDF document
            text_from_pdf=$(pdftotext "$file_path" -)
            echo "$text_from_pdf"
            ;;
        gif|jpeg|jpg|png|tif|tiff)
            # image file
            text_from_image=$(tesseract "$file_path" stdout -l "$language" 2>/dev/null)
            echo "$text_from_image"
            ;;
        *)
            # handle anything else as text file
            echo $(<"$text_file")
            ;;
    esac
}

# ...

chat_completion()

The reason why we need an extra function for the Ollama API is, that we do the translation process in two steps:

“Scanning” can produce typos, so we should let the LLM try to fix those errors before we start
do the translation with the fixed text from step 1

# ...

# calls Ollama API
#
# $1 => the prompt to send
chat_completion() {
    local prompt=$1

    # if you run the API on another address, you
    # can customize by using `TGF_API_URL` environment variable
    url="${TGF_API_URL:-http://localhost:11434/api/generate}"

    # if you prefer another model, you
    # can customize by using `TGF_LLM_MODEL` environment variable
    model="${TGF_LLM_MODEL:-phi3}"

    # if you prefer another temperature value, you
    # can customize by using `TGF_LLM_TEMPERATURE` environment variable
    temperature="${TGF_LLM_TEMPERATURE:-0}"

    # create a JSON for the Ollama API
    json_data=$(jq -n \
                    --arg model "$model" \
                    --arg prompt "$prompt" \
                    --argjson temperature "$temperature" \
                    '{model: $model, prompt: $prompt, options: {temperature: $temperature}, stream: false}')

    # execute Ollama API
    response=$(wget --header "Content-Type: application/json" \
                    --post-data "$json_data" \
                    "$url" \
                    -q \
                    -O - 2>&1) # Captures both the output and server response headers

    # if the execution was successful, there should
    # be a `response` property in the output
    extracted_response=$(echo "$response" | jq -r '.response')

    if [ -n "$extracted_response" ]; then
        echo "$extracted_response"
    else
        write_line "[ERROR] No valid JSON found!"
        exit 1
    fi
}

# ...

Execution

Lets say the bash script is called t.sh, you can execute it in the terminal like this after adding execution permissions with chmod +x t.sh:

t.sh ./path/to/a/german/source/file deu > ./path/to/output/file.txt

Conclusion

The final script can be found on GitHub as Gist.

KEEP IN MIND: LLMs are not very accurate and may produce different results. You’ll have to live with the fact that you’ll have to get involved at one point or another later on. Nevertheless, the more texts you have to translate, the easier your work should be.

I have used Phi-3 by Microsoft, but you should experiment with some models to figure out which one will produce the best results for you.