Translate text and image files with bash script
I would like to follow up on my post from Wednesday and show how to translate text with a bash script using tools like Tesseract, pdftotext and Ollama.
Requirements
Ollama
Ollama is a generic wrapper for many LLM models that can be run locally, such as Meta’s Llama 3 or Phi-3.
Tesseract
Tesseract is a free OCR tool for the command line that can be installed in nearly all operating systems like Linux, Windows and MacOS.
We need it to extract text from GIF, JPEG or PNG image files.
Get the latest binaries from here or use a package manager like apt or brew/Homebrew to install it.
pdftotext
Also a command line tool that we need to extract text from PDF files.
You can download or install it from here.
Script
The script supports the following arguments:
- File path: This is required and defines the source file or stream from where we want to extract text from
- Source language: Explicit (but optional) information, especially for Tesseract, about the language of the source, like
deu
oreng
The translated text will be written to STDOUT, so it can be piped to files e.g.
Functions
I have implemented the following functions for some recurring tasks:
write_line()
To separate the translated text from the console output, we can use STDERR
stream:
# ...
# an "echo" that directly writes to STDERR
# so that the translated output can be piped
# from STDOUT
write_line() {
local text=$1
echo "$text" >&2
}
# ...
get_text_from_file()
This function helps to decide what tool will be executed for which file:
# ...
# reads text from a files
#
# $1 => path to the file to scan
# $2 => optional information of the language that is used in the file, like "eng" or "deu"
get_text_from_file() {
local file_path=$1
local language=$2
filename=$(basename "$1")
extension="${filename##*.}"
case "$extension" in
pdf)
# PDF document
text_from_pdf=$(pdftotext "$file_path" -)
echo "$text_from_pdf"
;;
gif|jpeg|jpg|png|tif|tiff)
# image file
text_from_image=$(tesseract "$file_path" stdout -l "$language" 2>/dev/null)
echo "$text_from_image"
;;
*)
# handle anything else as text file
echo $(<"$text_file")
;;
esac
}
# ...
chat_completion()
The reason why we need an extra function for the Ollama API is, that we do the translation process in two steps:
- “Scanning” can produce typos, so we should let the LLM try to fix those errors before we start
- do the translation with the fixed text from step 1
# ...
# calls Ollama API
#
# $1 => the prompt to send
chat_completion() {
local prompt=$1
# if you run the API on another address, you
# can customize by using `TGF_API_URL` environment variable
url="${TGF_API_URL:-http://localhost:11434/api/generate}"
# if you prefer another model, you
# can customize by using `TGF_LLM_MODEL` environment variable
model="${TGF_LLM_MODEL:-phi3}"
# if you prefer another temperature value, you
# can customize by using `TGF_LLM_TEMPERATURE` environment variable
temperature="${TGF_LLM_TEMPERATURE:-0}"
# create a JSON for the Ollama API
json_data=$(jq -n \
--arg model "$model" \
--arg prompt "$prompt" \
--argjson temperature "$temperature" \
'{model: $model, prompt: $prompt, options: {temperature: $temperature}, stream: false}')
# execute Ollama API
response=$(wget --header "Content-Type: application/json" \
--post-data "$json_data" \
"$url" \
-q \
-O - 2>&1) # Captures both the output and server response headers
# if the execution was successful, there should
# be a `response` property in the output
extracted_response=$(echo "$response" | jq -r '.response')
if [ -n "$extracted_response" ]; then
echo "$extracted_response"
else
write_line "[ERROR] No valid JSON found!"
exit 1
fi
}
# ...
Execution
Lets say the bash script is called t.sh
, you can execute it in the terminal like this after adding execution permissions with chmod +x t.sh
:
t.sh ./path/to/a/german/source/file deu > ./path/to/output/file.txt
Conclusion
The final script can be found on GitHub as Gist.
KEEP IN MIND: LLMs are not very accurate and may produce different results. You’ll have to live with the fact that you’ll have to get involved at one point or another later on. Nevertheless, the more texts you have to translate, the easier your work should be.
I have used Phi-3 by Microsoft, but you should experiment with some models to figure out which one will produce the best results for you.