Backend & DevOps Blog

Real-world experiences with MongoDB, Docker, Kubernetes and more

Automatic Docker Image Tagging with GitHub Actions

When our team first started using Docker for our application deployments, we thought image tagging was a trivial detail. How hard could it be to slap a version number on an image? But after several production incidents caused by confusion over which image version was deployed where, we learned that a thoughtful tagging strategy is essential for a robust CI/CD pipeline. This is the story of how we evolved our Docker image tagging approach using GitHub Actions.

The Problem: "latest" Wasn't So Great

Our initial approach was simple - perhaps too simple. We used the infamous latest tag for all our images, with a basic GitHub Action workflow:

name: Docker Build and Push

on:
  push:
    branches: [ main ]

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      
      - name: Login to DockerHub
        uses: docker/login-action@v1
        with:
          username: ${{ secrets.DOCKERHUB_USERNAME }}
          password: ${{ secrets.DOCKERHUB_TOKEN }}
      
      - name: Build and push
        uses: docker/build-push-action@v2
        with:
          push: true
          tags: ourcompany/app:latest

This simple approach quickly led to problems:

  1. We couldn't easily tell which code version was in a production container
  2. Rolling back meant rebuilding a previous version
  3. Concurrent deployments to different environments could overwrite each other's "latest" tag
  4. We had no audit trail of which images had been deployed when

The final straw came when a developer accidentally pushed a work-in-progress change to main, which automatically built and deployed to production with the "latest" tag, overwriting a stable version.

First Improvement: Git Commit Hash Tags

Our first improvement was to tag images with the Git commit hash:

name: Docker Build and Push

on:
  push:
    branches: [ main ]

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      
      - name: Set short commit hash
        id: vars
        run: echo "::set-output name=sha_short::${GITHUB_SHA::7}"
      
      - name: Login to DockerHub
        uses: docker/login-action@v1
        with:
          username: ${{ secrets.DOCKERHUB_USERNAME }}
          password: ${{ secrets.DOCKERHUB_TOKEN }}
      
      - name: Build and push
        uses: docker/build-push-action@v2
        with:
          push: true
          tags: |
            ourcompany/app:latest
            ourcompany/app:${{ steps.vars.outputs.sha_short }}

This was better - we could now trace each image back to a specific commit. However, we still faced several challenges:

  • Commit hashes aren't human-readable or easily ordered
  • We were still using "latest" as a moving target
  • It wasn't easy to tell which environment an image was intended for

Problem: Tag Conflicts and Confusion

A few months into using commit-based tags, we encountered a perplexing issue. A production deployment appeared to deploy the wrong version of our code. After investigation, we discovered that a developer had manually forced a tag to be reused:

# What happened (manually run)
docker build -t ourcompany/app:abc1234 .  # abc1234 was an existing tag
docker push ourcompany/app:abc1234        # Overwrote the existing image

Since Docker tags are simply mutable pointers to immutable content-addressed images, nothing prevents the same tag from being reused for different image content. This realization prompted us to adopt a strategy to ensure tag uniqueness.

Solution: Time-Based Unique Tags

To ensure uniqueness, we added timestamps to our tags:

name: Docker Build and Push

on:
  push:
    branches: [ main ]

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      
      - name: Generate tag variables
        id: vars
        run: |
          echo "::set-output name=sha_short::${GITHUB_SHA::7}"
          echo "::set-output name=timestamp::$(date +%Y%m%d%H%M%S)"
      
      - name: Login to DockerHub
        uses: docker/login-action@v1
        with:
          username: ${{ secrets.DOCKERHUB_USERNAME }}
          password: ${{ secrets.DOCKERHUB_TOKEN }}
      
      - name: Build and push
        uses: docker/build-push-action@v2
        with:
          push: true
          tags: |
            ourcompany/app:latest
            ourcompany/app:${{ steps.vars.outputs.sha_short }}
            ourcompany/app:${{ steps.vars.outputs.timestamp }}_${{ steps.vars.outputs.sha_short }}

The timestamp+hash combination ensured that each tag was unique, providing better traceability.

Problem: Branching and Environments

As our deployment processes matured, we began using feature branches and deploying to multiple environments (development, staging, production). Our simple tagging strategy didn't convey which environment an image was intended for.

This led to confusion when developers would look at the registry and see dozens of similar tags without context.

Solution: Environment and Branch-Aware Tags

We enhanced our workflow to include branch names and target environments in our tags:

name: Docker Build and Push

on:
  push:
    branches:
      - main
      - 'feature/**'
      - 'release/**'

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      
      - name: Generate tag variables
        id: vars
        run: |
          # Generate shortened commit hash
          SHA_SHORT=${GITHUB_SHA::7}
          echo "::set-output name=sha_short::$SHA_SHORT"
          
          # Generate timestamp
          TIMESTAMP=$(date +%Y%m%d%H%M%S)
          echo "::set-output name=timestamp::$TIMESTAMP"
          
          # Extract branch name and sanitize it for Docker tag
          BRANCH=${GITHUB_REF#refs/heads/}
          BRANCH_SLUG=$(echo $BRANCH | sed -r 's/[/]+/-/g' | sed -r 's/[^a-zA-Z0-9-]+//g' | tr '[:upper:]' '[:lower:]')
          echo "::set-output name=branch_slug::$BRANCH_SLUG"
          
          # Determine environment from branch name
          if [[ "$BRANCH" == "main" ]]; then
            ENV="prod"
          elif [[ "$BRANCH" == release/* ]]; then
            ENV="staging"
          else
            ENV="dev"
          fi
          echo "::set-output name=env::$ENV"
          
          # Create unique tag combining all elements
          UNIQUE_TAG="${ENV}_${BRANCH_SLUG}_${TIMESTAMP}_${SHA_SHORT}"
          echo "::set-output name=unique_tag::$UNIQUE_TAG"
      
      - name: Login to DockerHub
        uses: docker/login-action@v1
        with:
          username: ${{ secrets.DOCKERHUB_USERNAME }}
          password: ${{ secrets.DOCKERHUB_TOKEN }}
      
      - name: Build and push
        uses: docker/build-push-action@v2
        with:
          push: true
          tags: |
            ourcompany/app:${{ steps.vars.outputs.env }}
            ourcompany/app:${{ steps.vars.outputs.sha_short }}
            ourcompany/app:${{ steps.vars.outputs.unique_tag }}

This approach created tags that were both unique and descriptive. For example:

  • ourcompany/app:prod_main_20230405123456_abc1234
  • ourcompany/app:staging_release-v2-1_20230406123456_def5678
  • ourcompany/app:dev_feature-new-login_20230407123456_ghi9012

Now, at a glance, we could tell which environment an image was built for, which branch it came from, when it was built, and which commit it contained.

Leveraging Docker Metadata Action

As our tagging strategy evolved, we discovered the docker/metadata-action, which was designed specifically for generating Docker tags based on Git context. This simplified our workflow:

name: Docker Build and Push

on:
  push:
    branches:
      - main
      - 'feature/**'
      - 'release/**'
    tags:
      - 'v*'

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      
      - name: Docker meta
        id: meta
        uses: docker/metadata-action@v3
        with:
          images: ourcompany/app
          tags: |
            type=ref,event=branch
            type=ref,event=pr
            type=semver,pattern={{version}}
            type=semver,pattern={{major}}.{{minor}}
            type=sha,format=short
            type=raw,value=latest,enable=${{ github.ref == 'refs/heads/main' }}
      
      - name: Login to DockerHub
        uses: docker/login-action@v1
        with:
          username: ${{ secrets.DOCKERHUB_USERNAME }}
          password: ${{ secrets.DOCKERHUB_TOKEN }}
      
      - name: Build and push
        uses: docker/build-push-action@v2
        with:
          push: true
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}

This action automatically created various tags based on:

  • Branch names (e.g., main, feature-new-login)
  • Pull request numbers
  • Semantic version tags (when Git tags were pushed)
  • Short commit SHA
  • The "latest" tag (only for the main branch)

Problem: Image Promotion Across Environments

Our next challenge was promoting the same image across different environments. Initially, we were rebuilding the image for each environment, which could lead to inconsistencies.

Solution: Retagging for Promotion

Instead of rebuilding, we implemented a promotion workflow that pulled an existing image and retagged it for the new environment:

name: Promote to Production

on:
  workflow_dispatch:
    inputs:
      source_tag:
        description: 'Source image tag to promote'
        required: true

jobs:
  promote:
    runs-on: ubuntu-latest
    steps:
      - name: Login to DockerHub
        uses: docker/login-action@v1
        with:
          username: ${{ secrets.DOCKERHUB_USERNAME }}
          password: ${{ secrets.DOCKERHUB_TOKEN }}
      
      - name: Generate promotion tag
        id: vars
        run: |
          TIMESTAMP=$(date +%Y%m%d%H%M%S)
          echo "::set-output name=timestamp::$TIMESTAMP"
          echo "::set-output name=promotion_tag::prod_promoted_${TIMESTAMP}_${{ github.event.inputs.source_tag }}"
      
      - name: Pull and retag
        run: |
          # Pull the source image
          docker pull ourcompany/app:${{ github.event.inputs.source_tag }}
          
          # Tag it for production
          docker tag ourcompany/app:${{ github.event.inputs.source_tag }} ourcompany/app:prod
          docker tag ourcompany/app:${{ github.event.inputs.source_tag }} ourcompany/app:${{ steps.vars.outputs.promotion_tag }}
          
          # Push the new tags
          docker push ourcompany/app:prod
          docker push ourcompany/app:${{ steps.vars.outputs.promotion_tag }}

This approach ensured that the exact same image binary was used across environments, eliminating the risk of inconsistencies from rebuilding.

Problem: Tag Proliferation and Cleanup

With our detailed tagging strategy, we quickly accumulated thousands of image tags in our registry. This made browsing difficult and increased storage costs.

Solution: Automated Tag Cleanup

We implemented a scheduled workflow to clean up old tags while preserving important ones:

name: Cleanup Docker Tags

on:
  schedule:
    # Run weekly on Sunday at midnight
    - cron: '0 0 * * 0'
  workflow_dispatch: {}  # Allow manual trigger

jobs:
  cleanup:
    runs-on: ubuntu-latest
    steps:
      - name: Login to DockerHub
        uses: docker/login-action@v1
        with:
          username: ${{ secrets.DOCKERHUB_USERNAME }}
          password: ${{ secrets.DOCKERHUB_TOKEN }}
      
      - name: Install DockerHub CLI
        run: |
          # Install dockerhub-cli for easier tag management
          npm install -g dockerhub-cli
          dockerhub login -u ${{ secrets.DOCKERHUB_USERNAME }} -p ${{ secrets.DOCKERHUB_TOKEN }}
      
      - name: Fetch tags
        id: fetch
        run: |
          # Get all tags for our image
          TAGS=$(dockerhub tags ourcompany/app --limit 5000 --output json | jq -r '.results[].name')
          echo "::set-output name=all_tags::$TAGS"
          
          # Identify tags to keep (environment tags, semantic versions, recent tags)
          KEEP_TAGS=$(echo "$TAGS" | grep -E '^(prod|staging|dev|v[0-9]+.[0-9]+.[0-9]+)$')
          
          # Find tags from the last 14 days (based on timestamp in tag)
          RECENT_TAGS=$(echo "$TAGS" | grep -E '_[0-9]{14}_' | sort -r | head -100)
          
          # Combine tags to keep
          COMBINED_KEEP="${KEEP_TAGS}\n${RECENT_TAGS}"
          echo "::set-output name=keep_tags::$COMBINED_KEEP"
      
      - name: Delete old tags
        run: |
          ALL_TAGS="${{ steps.fetch.outputs.all_tags }}"
          KEEP_TAGS="${{ steps.fetch.outputs.keep_tags }}"
          
          # Find tags to delete (all tags minus keep tags)
          for tag in $ALL_TAGS; do
            if ! echo "$KEEP_TAGS" | grep -q "$tag"; then
              echo "Deleting tag: $tag"
              dockerhub delete-tag ourcompany/app:$tag --yes
            fi
          done

This cleanup strategy:

  • Kept all environment-specific tags (prod, staging, dev)
  • Kept all semantic version tags (v1.2.3)
  • Kept the 100 most recent tags (based on timestamp)
  • Deleted all other tags to reduce clutter

Advanced Tag Management with Docker Digest

In our most recent iteration, we implemented digest tracking for even better traceability. Docker image digests are content-addressable identifiers that uniquely and immutably identify an image, regardless of its tags.

name: Docker Build with Digest Tracking

on:
  push:
    branches: [ main ]

jobs:
  build:
    runs-on: ubuntu-latest
    outputs:
      image-digest: ${{ steps.build.outputs.digest }}
    steps:
      - uses: actions/checkout@v2
      
      - name: Docker meta
        id: meta
        uses: docker/metadata-action@v3
        with:
          images: ourcompany/app
          tags: |
            type=raw,value=latest
            type=sha,format=short
      
      - name: Login to DockerHub
        uses: docker/login-action@v1
        with:
          username: ${{ secrets.DOCKERHUB_USERNAME }}
          password: ${{ secrets.DOCKERHUB_TOKEN }}
      
      - name: Build and push
        id: build
        uses: docker/build-push-action@v2
        with:
          push: true
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}
      
      - name: Store image digest
        run: |
          DIGEST=${{ steps.build.outputs.digest }}
          TIMESTAMP=$(date +%Y%m%d%H%M%S)
          SHORT_SHA=${GITHUB_SHA::7}
          
          # Store the mapping between git SHA, timestamp, and image digest
          echo "{
            "git_sha": "$GITHUB_SHA",
            "short_sha": "$SHORT_SHA",
            "timestamp": "$TIMESTAMP",
            "image_digest": "$DIGEST"
          }" > image-metadata.json
          
          # Upload to a persistent storage (e.g., S3, GitHub artifact)
          aws s3 cp image-metadata.json s3://ourcompany-deployments/image-metadata/$SHORT_SHA.json

By tracking digests, we gained several advantages:

  1. We could verify image integrity even if tags were tampered with
  2. We could confidently identify the exact binary content deployed to an environment
  3. Our deployment systems could refer to images by digest rather than tag for improved security

Implementing a Semantic Versioning Workflow

For our stable releases, we incorporated semantic versioning into our tagging strategy. This allowed us to communicate compatibility and significance of changes through version numbers:

name: Release with Semantic Versioning

on:
  workflow_dispatch:
    inputs:
      release_type:
        description: 'Type of release'
        required: true
        default: 'patch'
        options:
          - patch
          - minor
          - major

jobs:
  release:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
        with:
          fetch-depth: 0
      
      - name: Get latest version
        id: latest_version
        run: |
          # Get the latest version tag
          LATEST_TAG=$(git tag -l 'v*' | sort -V | tail -n1)
          if [ -z "$LATEST_TAG" ]; then
            # No tag exists yet, start with v0.1.0
            LATEST_TAG="v0.1.0"
          fi
          echo "::set-output name=tag::$LATEST_TAG"
      
      - name: Calculate new version
        id: new_version
        run: |
          LATEST_TAG=${{ steps.latest_version.outputs.tag }}
          RELEASE_TYPE=${{ github.event.inputs.release_type }}
          
          # Extract components
          MAJOR=$(echo $LATEST_TAG | sed 's/v\([0-9]*\).*/\1/')
          MINOR=$(echo $LATEST_TAG | sed 's/v[0-9]*\.\([0-9]*\).*/\1/')
          PATCH=$(echo $LATEST_TAG | sed 's/v[0-9]*\.[0-9]*\.\([0-9]*\).*/\1/')
          
          # Increment based on release type
          if [ "$RELEASE_TYPE" = "major" ]; then
            MAJOR=$((MAJOR + 1))
            MINOR=0
            PATCH=0
          elif [ "$RELEASE_TYPE" = "minor" ]; then
            MINOR=$((MINOR + 1))
            PATCH=0
          else
            PATCH=$((PATCH + 1))
          fi
          
          NEW_TAG="v${MAJOR}.${MINOR}.${PATCH}"
          echo "::set-output name=tag::$NEW_TAG"
      
      - name: Create and push tag
        run: |
          git config user.name "GitHub Actions"
          git config user.email "[email protected]"
          
          NEW_TAG=${{ steps.new_version.outputs.tag }}
          git tag -a $NEW_TAG -m "Release $NEW_TAG"
          git push origin $NEW_TAG
      
      - name: Docker meta
        id: meta
        uses: docker/metadata-action@v3
        with:
          images: ourcompany/app
          tags: |
            type=raw,value=${{ steps.new_version.outputs.tag }}
            type=semver,pattern={{major}}.{{minor}},value=${{ steps.new_version.outputs.tag }}
            type=semver,pattern={{major}},value=${{ steps.new_version.outputs.tag }}
            type=raw,value=latest
      
      - name: Login to DockerHub
        uses: docker/login-action@v1
        with:
          username: ${{ secrets.DOCKERHUB_USERNAME }}
          password: ${{ secrets.DOCKERHUB_TOKEN }}
      
      - name: Build and push
        uses: docker/build-push-action@v2
        with:
          push: true
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}

This workflow would generate tags like:

  • ourcompany/app:v1.2.3 (exact version)
  • ourcompany/app:1.2 (major.minor version)
  • ourcompany/app:1 (major version)
  • ourcompany/app:latest

This allowed users to choose their desired level of version stability, from pinning to an exact version for maximum stability to using a major version tag for convenience with some flexibility.

Our Final Tagging Strategy

After many iterations and lessons learned, our final Docker image tagging strategy included:

  1. Immutable unique tags with timestamps and commit hashes
  2. Environment-specific tags (prod, staging, dev)
  3. Semantic version tags for stable releases
  4. Branch name tags for feature development
  5. Digest tracking for immutable content addressing
  6. Tag cleanup to manage registry size

This comprehensive strategy provided:

  • Clear traceability from image to source code
  • Easy identification of which environments an image was intended for
  • Support for promoting the same image across environments
  • Simple rollback to previous versions when needed
  • Protection against tag confusion or reuse
  • Proper versioning of stable releases

Conclusion

What started as a seemingly trivial aspect of our CI/CD pipeline - Docker image tagging - evolved into a crucial component of our software delivery process.

A well-thought-out tagging strategy provides clarity, reliability, and traceability that simplifies operations and troubleshooting. It may seem like overkill initially, but the benefits become clear the first time you need to track down which exact code version is running in production or perform an emergency rollback at 3 AM.

By leveraging GitHub Actions and developing a consistent tagging convention, we transformed our container deployment from a source of stress to a source of confidence.