Automatic Docker Image Tagging with GitHub Actions
When our team first started using Docker for our application deployments, we thought image tagging was a trivial detail. How hard could it be to slap a version number on an image? But after several production incidents caused by confusion over which image version was deployed where, we learned that a thoughtful tagging strategy is essential for a robust CI/CD pipeline. This is the story of how we evolved our Docker image tagging approach using GitHub Actions.
The Problem: "latest" Wasn't So Great
Our initial approach was simple - perhaps too simple. We used the infamous latest tag for all our images, with a basic GitHub Action workflow:
name: Docker Build and Push
on:
push:
branches: [ main ]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Login to DockerHub
uses: docker/login-action@v1
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
- name: Build and push
uses: docker/build-push-action@v2
with:
push: true
tags: ourcompany/app:latestThis simple approach quickly led to problems:
- We couldn't easily tell which code version was in a production container
- Rolling back meant rebuilding a previous version
- Concurrent deployments to different environments could overwrite each other's "latest" tag
- We had no audit trail of which images had been deployed when
The final straw came when a developer accidentally pushed a work-in-progress change to main, which automatically built and deployed to production with the "latest" tag, overwriting a stable version.
First Improvement: Git Commit Hash Tags
Our first improvement was to tag images with the Git commit hash:
name: Docker Build and Push
on:
push:
branches: [ main ]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Set short commit hash
id: vars
run: echo "::set-output name=sha_short::${GITHUB_SHA::7}"
- name: Login to DockerHub
uses: docker/login-action@v1
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
- name: Build and push
uses: docker/build-push-action@v2
with:
push: true
tags: |
ourcompany/app:latest
ourcompany/app:${{ steps.vars.outputs.sha_short }}This was better - we could now trace each image back to a specific commit. However, we still faced several challenges:
- Commit hashes aren't human-readable or easily ordered
- We were still using "latest" as a moving target
- It wasn't easy to tell which environment an image was intended for
Problem: Tag Conflicts and Confusion
A few months into using commit-based tags, we encountered a perplexing issue. A production deployment appeared to deploy the wrong version of our code. After investigation, we discovered that a developer had manually forced a tag to be reused:
# What happened (manually run)
docker build -t ourcompany/app:abc1234 . # abc1234 was an existing tag
docker push ourcompany/app:abc1234 # Overwrote the existing imageSince Docker tags are simply mutable pointers to immutable content-addressed images, nothing prevents the same tag from being reused for different image content. This realization prompted us to adopt a strategy to ensure tag uniqueness.
Solution: Time-Based Unique Tags
To ensure uniqueness, we added timestamps to our tags:
name: Docker Build and Push
on:
push:
branches: [ main ]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Generate tag variables
id: vars
run: |
echo "::set-output name=sha_short::${GITHUB_SHA::7}"
echo "::set-output name=timestamp::$(date +%Y%m%d%H%M%S)"
- name: Login to DockerHub
uses: docker/login-action@v1
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
- name: Build and push
uses: docker/build-push-action@v2
with:
push: true
tags: |
ourcompany/app:latest
ourcompany/app:${{ steps.vars.outputs.sha_short }}
ourcompany/app:${{ steps.vars.outputs.timestamp }}_${{ steps.vars.outputs.sha_short }}The timestamp+hash combination ensured that each tag was unique, providing better traceability.
Problem: Branching and Environments
As our deployment processes matured, we began using feature branches and deploying to multiple environments (development, staging, production). Our simple tagging strategy didn't convey which environment an image was intended for.
This led to confusion when developers would look at the registry and see dozens of similar tags without context.
Solution: Environment and Branch-Aware Tags
We enhanced our workflow to include branch names and target environments in our tags:
name: Docker Build and Push
on:
push:
branches:
- main
- 'feature/**'
- 'release/**'
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Generate tag variables
id: vars
run: |
# Generate shortened commit hash
SHA_SHORT=${GITHUB_SHA::7}
echo "::set-output name=sha_short::$SHA_SHORT"
# Generate timestamp
TIMESTAMP=$(date +%Y%m%d%H%M%S)
echo "::set-output name=timestamp::$TIMESTAMP"
# Extract branch name and sanitize it for Docker tag
BRANCH=${GITHUB_REF#refs/heads/}
BRANCH_SLUG=$(echo $BRANCH | sed -r 's/[/]+/-/g' | sed -r 's/[^a-zA-Z0-9-]+//g' | tr '[:upper:]' '[:lower:]')
echo "::set-output name=branch_slug::$BRANCH_SLUG"
# Determine environment from branch name
if [[ "$BRANCH" == "main" ]]; then
ENV="prod"
elif [[ "$BRANCH" == release/* ]]; then
ENV="staging"
else
ENV="dev"
fi
echo "::set-output name=env::$ENV"
# Create unique tag combining all elements
UNIQUE_TAG="${ENV}_${BRANCH_SLUG}_${TIMESTAMP}_${SHA_SHORT}"
echo "::set-output name=unique_tag::$UNIQUE_TAG"
- name: Login to DockerHub
uses: docker/login-action@v1
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
- name: Build and push
uses: docker/build-push-action@v2
with:
push: true
tags: |
ourcompany/app:${{ steps.vars.outputs.env }}
ourcompany/app:${{ steps.vars.outputs.sha_short }}
ourcompany/app:${{ steps.vars.outputs.unique_tag }}This approach created tags that were both unique and descriptive. For example:
ourcompany/app:prod_main_20230405123456_abc1234ourcompany/app:staging_release-v2-1_20230406123456_def5678ourcompany/app:dev_feature-new-login_20230407123456_ghi9012
Now, at a glance, we could tell which environment an image was built for, which branch it came from, when it was built, and which commit it contained.
Leveraging Docker Metadata Action
As our tagging strategy evolved, we discovered the docker/metadata-action, which was designed specifically for generating Docker tags based on Git context. This simplified our workflow:
name: Docker Build and Push
on:
push:
branches:
- main
- 'feature/**'
- 'release/**'
tags:
- 'v*'
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Docker meta
id: meta
uses: docker/metadata-action@v3
with:
images: ourcompany/app
tags: |
type=ref,event=branch
type=ref,event=pr
type=semver,pattern={{version}}
type=semver,pattern={{major}}.{{minor}}
type=sha,format=short
type=raw,value=latest,enable=${{ github.ref == 'refs/heads/main' }}
- name: Login to DockerHub
uses: docker/login-action@v1
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
- name: Build and push
uses: docker/build-push-action@v2
with:
push: true
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}This action automatically created various tags based on:
- Branch names (e.g.,
main,feature-new-login) - Pull request numbers
- Semantic version tags (when Git tags were pushed)
- Short commit SHA
- The "latest" tag (only for the main branch)
Problem: Image Promotion Across Environments
Our next challenge was promoting the same image across different environments. Initially, we were rebuilding the image for each environment, which could lead to inconsistencies.
Solution: Retagging for Promotion
Instead of rebuilding, we implemented a promotion workflow that pulled an existing image and retagged it for the new environment:
name: Promote to Production
on:
workflow_dispatch:
inputs:
source_tag:
description: 'Source image tag to promote'
required: true
jobs:
promote:
runs-on: ubuntu-latest
steps:
- name: Login to DockerHub
uses: docker/login-action@v1
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
- name: Generate promotion tag
id: vars
run: |
TIMESTAMP=$(date +%Y%m%d%H%M%S)
echo "::set-output name=timestamp::$TIMESTAMP"
echo "::set-output name=promotion_tag::prod_promoted_${TIMESTAMP}_${{ github.event.inputs.source_tag }}"
- name: Pull and retag
run: |
# Pull the source image
docker pull ourcompany/app:${{ github.event.inputs.source_tag }}
# Tag it for production
docker tag ourcompany/app:${{ github.event.inputs.source_tag }} ourcompany/app:prod
docker tag ourcompany/app:${{ github.event.inputs.source_tag }} ourcompany/app:${{ steps.vars.outputs.promotion_tag }}
# Push the new tags
docker push ourcompany/app:prod
docker push ourcompany/app:${{ steps.vars.outputs.promotion_tag }}This approach ensured that the exact same image binary was used across environments, eliminating the risk of inconsistencies from rebuilding.
Problem: Tag Proliferation and Cleanup
With our detailed tagging strategy, we quickly accumulated thousands of image tags in our registry. This made browsing difficult and increased storage costs.
Solution: Automated Tag Cleanup
We implemented a scheduled workflow to clean up old tags while preserving important ones:
name: Cleanup Docker Tags
on:
schedule:
# Run weekly on Sunday at midnight
- cron: '0 0 * * 0'
workflow_dispatch: {} # Allow manual trigger
jobs:
cleanup:
runs-on: ubuntu-latest
steps:
- name: Login to DockerHub
uses: docker/login-action@v1
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
- name: Install DockerHub CLI
run: |
# Install dockerhub-cli for easier tag management
npm install -g dockerhub-cli
dockerhub login -u ${{ secrets.DOCKERHUB_USERNAME }} -p ${{ secrets.DOCKERHUB_TOKEN }}
- name: Fetch tags
id: fetch
run: |
# Get all tags for our image
TAGS=$(dockerhub tags ourcompany/app --limit 5000 --output json | jq -r '.results[].name')
echo "::set-output name=all_tags::$TAGS"
# Identify tags to keep (environment tags, semantic versions, recent tags)
KEEP_TAGS=$(echo "$TAGS" | grep -E '^(prod|staging|dev|v[0-9]+.[0-9]+.[0-9]+)$')
# Find tags from the last 14 days (based on timestamp in tag)
RECENT_TAGS=$(echo "$TAGS" | grep -E '_[0-9]{14}_' | sort -r | head -100)
# Combine tags to keep
COMBINED_KEEP="${KEEP_TAGS}\n${RECENT_TAGS}"
echo "::set-output name=keep_tags::$COMBINED_KEEP"
- name: Delete old tags
run: |
ALL_TAGS="${{ steps.fetch.outputs.all_tags }}"
KEEP_TAGS="${{ steps.fetch.outputs.keep_tags }}"
# Find tags to delete (all tags minus keep tags)
for tag in $ALL_TAGS; do
if ! echo "$KEEP_TAGS" | grep -q "$tag"; then
echo "Deleting tag: $tag"
dockerhub delete-tag ourcompany/app:$tag --yes
fi
doneThis cleanup strategy:
- Kept all environment-specific tags (prod, staging, dev)
- Kept all semantic version tags (v1.2.3)
- Kept the 100 most recent tags (based on timestamp)
- Deleted all other tags to reduce clutter
Advanced Tag Management with Docker Digest
In our most recent iteration, we implemented digest tracking for even better traceability. Docker image digests are content-addressable identifiers that uniquely and immutably identify an image, regardless of its tags.
name: Docker Build with Digest Tracking
on:
push:
branches: [ main ]
jobs:
build:
runs-on: ubuntu-latest
outputs:
image-digest: ${{ steps.build.outputs.digest }}
steps:
- uses: actions/checkout@v2
- name: Docker meta
id: meta
uses: docker/metadata-action@v3
with:
images: ourcompany/app
tags: |
type=raw,value=latest
type=sha,format=short
- name: Login to DockerHub
uses: docker/login-action@v1
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
- name: Build and push
id: build
uses: docker/build-push-action@v2
with:
push: true
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
- name: Store image digest
run: |
DIGEST=${{ steps.build.outputs.digest }}
TIMESTAMP=$(date +%Y%m%d%H%M%S)
SHORT_SHA=${GITHUB_SHA::7}
# Store the mapping between git SHA, timestamp, and image digest
echo "{
"git_sha": "$GITHUB_SHA",
"short_sha": "$SHORT_SHA",
"timestamp": "$TIMESTAMP",
"image_digest": "$DIGEST"
}" > image-metadata.json
# Upload to a persistent storage (e.g., S3, GitHub artifact)
aws s3 cp image-metadata.json s3://ourcompany-deployments/image-metadata/$SHORT_SHA.jsonBy tracking digests, we gained several advantages:
- We could verify image integrity even if tags were tampered with
- We could confidently identify the exact binary content deployed to an environment
- Our deployment systems could refer to images by digest rather than tag for improved security
Implementing a Semantic Versioning Workflow
For our stable releases, we incorporated semantic versioning into our tagging strategy. This allowed us to communicate compatibility and significance of changes through version numbers:
name: Release with Semantic Versioning
on:
workflow_dispatch:
inputs:
release_type:
description: 'Type of release'
required: true
default: 'patch'
options:
- patch
- minor
- major
jobs:
release:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
with:
fetch-depth: 0
- name: Get latest version
id: latest_version
run: |
# Get the latest version tag
LATEST_TAG=$(git tag -l 'v*' | sort -V | tail -n1)
if [ -z "$LATEST_TAG" ]; then
# No tag exists yet, start with v0.1.0
LATEST_TAG="v0.1.0"
fi
echo "::set-output name=tag::$LATEST_TAG"
- name: Calculate new version
id: new_version
run: |
LATEST_TAG=${{ steps.latest_version.outputs.tag }}
RELEASE_TYPE=${{ github.event.inputs.release_type }}
# Extract components
MAJOR=$(echo $LATEST_TAG | sed 's/v\([0-9]*\).*/\1/')
MINOR=$(echo $LATEST_TAG | sed 's/v[0-9]*\.\([0-9]*\).*/\1/')
PATCH=$(echo $LATEST_TAG | sed 's/v[0-9]*\.[0-9]*\.\([0-9]*\).*/\1/')
# Increment based on release type
if [ "$RELEASE_TYPE" = "major" ]; then
MAJOR=$((MAJOR + 1))
MINOR=0
PATCH=0
elif [ "$RELEASE_TYPE" = "minor" ]; then
MINOR=$((MINOR + 1))
PATCH=0
else
PATCH=$((PATCH + 1))
fi
NEW_TAG="v${MAJOR}.${MINOR}.${PATCH}"
echo "::set-output name=tag::$NEW_TAG"
- name: Create and push tag
run: |
git config user.name "GitHub Actions"
git config user.email "[email protected]"
NEW_TAG=${{ steps.new_version.outputs.tag }}
git tag -a $NEW_TAG -m "Release $NEW_TAG"
git push origin $NEW_TAG
- name: Docker meta
id: meta
uses: docker/metadata-action@v3
with:
images: ourcompany/app
tags: |
type=raw,value=${{ steps.new_version.outputs.tag }}
type=semver,pattern={{major}}.{{minor}},value=${{ steps.new_version.outputs.tag }}
type=semver,pattern={{major}},value=${{ steps.new_version.outputs.tag }}
type=raw,value=latest
- name: Login to DockerHub
uses: docker/login-action@v1
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
- name: Build and push
uses: docker/build-push-action@v2
with:
push: true
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}This workflow would generate tags like:
ourcompany/app:v1.2.3(exact version)ourcompany/app:1.2(major.minor version)ourcompany/app:1(major version)ourcompany/app:latest
This allowed users to choose their desired level of version stability, from pinning to an exact version for maximum stability to using a major version tag for convenience with some flexibility.
Our Final Tagging Strategy
After many iterations and lessons learned, our final Docker image tagging strategy included:
- Immutable unique tags with timestamps and commit hashes
- Environment-specific tags (prod, staging, dev)
- Semantic version tags for stable releases
- Branch name tags for feature development
- Digest tracking for immutable content addressing
- Tag cleanup to manage registry size
This comprehensive strategy provided:
- Clear traceability from image to source code
- Easy identification of which environments an image was intended for
- Support for promoting the same image across environments
- Simple rollback to previous versions when needed
- Protection against tag confusion or reuse
- Proper versioning of stable releases
Conclusion
What started as a seemingly trivial aspect of our CI/CD pipeline - Docker image tagging - evolved into a crucial component of our software delivery process.
A well-thought-out tagging strategy provides clarity, reliability, and traceability that simplifies operations and troubleshooting. It may seem like overkill initially, but the benefits become clear the first time you need to track down which exact code version is running in production or perform an emergency rollback at 3 AM.
By leveraging GitHub Actions and developing a consistent tagging convention, we transformed our container deployment from a source of stress to a source of confidence.