Automating MongoDB Backups with GitHub Actions
Database backups are like insurance policies—you hope you'll never need them, but when disaster strikes, you'll be grateful you invested the time to set them up properly. After our team experienced a near-catastrophic data loss event, we decided to implement a robust, automated MongoDB backup solution using GitHub Actions. The journey involved navigating permission issues, safely managing credentials, and ensuring backups could be reliably restored when needed.
The Incident That Started It All
It was a normal Tuesday afternoon when our MongoDB instance suddenly became unresponsive. After investigation, we discovered that a well-intentioned but ill-advised schema migration script had run in production without proper testing, corrupting critical collections. Our most recent backup was four days old, meaning we potentially faced substantial data loss.
We managed to recover most of the data through a combination of database journal files and application logs, but the experience highlighted a critical gap in our infrastructure: we lacked a reliable, automated backup system.
Initial Approach: A Simple Cron Job
Our first thought was to set up a simple cron job on a dedicated server:
# /etc/cron.d/mongodb-backup
0 2 * * * ec2-user /usr/local/bin/mongodb-backup.sh > /var/log/mongodb-backup.log 2>&1With a basic shell script:
#!/bin/bash
# mongodb-backup.sh
DATE=$(date +%Y-%m-%d)
MONGO_URI="mongodb://username:password@localhost:27017/mydb"
# Create backup
mongodump --uri="$MONGO_URI" --out="/backups/$DATE"
# Compress backup
tar -czf "/backups/$DATE.tar.gz" -C "/backups" "$DATE"
# Remove uncompressed directory
rm -rf "/backups/$DATE"
# Upload to S3
aws s3 cp "/backups/$DATE.tar.gz" "s3://my-backups/mongodb/$DATE.tar.gz"
# Keep only last 7 local backups
find /backups -name "*.tar.gz" -type f -mtime +7 -deleteWhile this approach worked, it had several significant drawbacks:
- Credentials were stored in plaintext in the script
- The backup process lacked monitoring and alerting
- We had no easy way to test restore procedures
- The backup server became a single point of failure
- Managing the backup infrastructure was another operational burden
Moving to GitHub Actions
Since our team was already using GitHub Actions for CI/CD, we decided to leverage it for our backup strategy as well. This would give us several advantages:
- No dedicated backup server to maintain
- Built-in scheduling with the
crontrigger - Secret management for database credentials
- Detailed logs and notifications for failures
- Version-controlled backup configuration
Our initial GitHub Actions workflow looked like this:
# .github/workflows/mongodb-backup.yml
name: MongoDB Backup
on:
schedule:
- cron: '0 2 * * *' # Run at 2 AM UTC daily
workflow_dispatch: # Allow manual trigger
jobs:
backup:
runs-on: ubuntu-latest
steps:
- name: Install MongoDB tools
run: |
wget -qO - https://www.mongodb.org/static/pgp/server-5.0.asc | sudo apt-key add -
echo "deb [ arch=amd64,arm64 ] https://repo.mongodb.org/apt/ubuntu focal/mongodb-org/5.0 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-5.0.list
sudo apt-get update
sudo apt-get install -y mongodb-database-tools
- name: Run MongoDB backup
env:
MONGO_URI: ${{ secrets.MONGO_URI }}
run: |
DATE=$(date +%Y-%m-%d)
mongodump --uri="$MONGO_URI" --out="./backup/$DATE"
tar -czf "$DATE.tar.gz" -C "./backup" "$DATE"
- name: Upload to S3
env:
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
AWS_REGION: 'us-east-1'
run: |
DATE=$(date +%Y-%m-%d)
aws s3 cp "$DATE.tar.gz" "s3://my-backups/mongodb/$DATE.tar.gz"
- name: Clean up
run: |
DATE=$(date +%Y-%m-%d)
rm -rf "./backup/$DATE" "$DATE.tar.gz"We set up the necessary GitHub Secrets for MONGO_URI, AWS_ACCESS_KEY_ID, and AWS_SECRET_ACCESS_KEY, and then waited for the workflow to run.
Problem #1: Scheduled Actions Not Running
To our surprise, our scheduled backup didn't run the next day. After investigating, we realized that GitHub Actions schedules are only triggered on the default branch. Our workflow was on a feature branch that hadn't been merged yet! Once we merged to the main branch, the scheduled workflow began running.
Problem #2: MongoDB Authentication Errors
Our first successful run failed with an authentication error:
Error: error connecting to db server: server returned error on SASL authentication step: Authentication failed.After double-checking our credentials, we discovered two issues:
- Our MongoDB URI had special characters that needed proper escaping in the GitHub Secrets
- We needed to specify the authentication database explicitly
We updated our MongoDB URI in the GitHub Secrets to fix these issues:
# Original (problematic)
mongodb://username:p@[email protected]:27017/mydb
# Updated (working)
mongodb://username:p%[email protected]:27017/mydb?authSource=adminProblem #3: Network Access Restrictions
Our next attempt failed because our MongoDB server had IP restrictions, and the GitHub Actions runner's IP wasn't in the allowed list. We had two options:
- Add GitHub Actions IP ranges to our MongoDB allowed list (not ideal as they can change)
- Set up a self-hosted runner inside our VPC with access to the MongoDB server
For security reasons, we chose the second option. We updated our workflow to use a self-hosted runner:
# .github/workflows/mongodb-backup.yml
name: MongoDB Backup
on:
schedule:
- cron: '0 2 * * *'
workflow_dispatch:
jobs:
backup:
runs-on: self-hosted # Changed to self-hosted runner
steps:
# Rest of the workflow remains the sameProblem #4: Empty Backups
After fixing the network and authentication issues, our backup workflow ran successfully. However, when we examined the backup files in S3, we found some were empty or much smaller than expected. The issue? We didn't provide enough time for large collections to be dumped before compressing.
We modified our workflow to add more diagnostics and to ensure the mongodump completed successfully:
- name: Run MongoDB backup
env:
MONGO_URI: ${{ secrets.MONGO_URI }}
run: |
DATE=$(date +%Y-%m-%d)
echo "Starting MongoDB dump at $(date)"
mkdir -p "./backup/$DATE"
# Add verbosity for better diagnostics
mongodump --uri="$MONGO_URI" --out="./backup/$DATE" --verbose
# Check if dump completed successfully
if [ $? -ne 0 ]; then
echo "MongoDB dump failed"
exit 1
fi
echo "MongoDB dump completed at $(date)"
echo "Backup size: $(du -sh ./backup/$DATE)"
echo "Compressing backup..."
tar -czf "$DATE.tar.gz" -C "./backup" "$DATE"
echo "Compression completed. Archive size: $(du -sh $DATE.tar.gz)"This change gave us better visibility into the backup process and helped us identify that some larger collections needed more time.
Problem #5: S3 Permissions
Our next challenge was with S3 permissions. The workflow could upload backups, but we encountered this error when trying to list existing backups:
An error occurred (AccessDenied) when calling the ListObjects operation: Access DeniedThe IAM policy for our AWS access key was too restrictive. We updated it to include the necessary permissions:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:ListBucket",
"s3:DeleteObject"
],
"Resource": [
"arn:aws:s3:::my-backups",
"arn:aws:s3:::my-backups/*"
]
}
]
}Setting Up Backup Rotation
With the basic backup process working reliably, we needed to implement a backup rotation policy to manage storage costs and ensure we had a good retention strategy. We decided on:
- Daily backups for the past 7 days
- Weekly backups for the past 4 weeks
- Monthly backups for the past 12 months
- Yearly backups indefinitely
We implemented this by extending our workflow:
- name: Implement backup rotation
env:
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
AWS_REGION: 'us-east-1'
run: |
DATE=$(date +%Y-%m-%d)
DOW=$(date +%u) # Day of week (1-7)
DOM=$(date +%d) # Day of month (01-31)
# Create a weekly backup on Sundays
if [ "$DOW" = "7" ]; then
WEEK=$(date +%Y-W%V)
aws s3 cp "$DATE.tar.gz" "s3://my-backups/mongodb/weekly/$WEEK.tar.gz"
echo "Created weekly backup: $WEEK.tar.gz"
fi
# Create a monthly backup on the 1st of the month
if [ "$DOM" = "01" ]; then
MONTH=$(date +%Y-%m)
aws s3 cp "$DATE.tar.gz" "s3://my-backups/mongodb/monthly/$MONTH.tar.gz"
echo "Created monthly backup: $MONTH.tar.gz"
fi
# Create a yearly backup on January 1st
if [ "$DATE" = "$(date +%Y)-01-01" ]; then
YEAR=$(date +%Y)
aws s3 cp "$DATE.tar.gz" "s3://my-backups/mongodb/yearly/$YEAR.tar.gz"
echo "Created yearly backup: $YEAR.tar.gz"
fi
# Delete daily backups older than 7 days
aws s3 ls "s3://my-backups/mongodb/" | grep -v "/" | awk '{print $4}' | while read backup; do
backup_date=$(echo $backup | sed 's/.tar.gz//')
days_old=$(( ( $(date +%s) - $(date -d "$backup_date" +%s) ) / 86400 ))
if [ $days_old -gt 7 ]; then
aws s3 rm "s3://my-backups/mongodb/$backup"
echo "Deleted old daily backup: $backup"
fi
doneImplementing Backup Verification
A backup is only as good as its ability to be restored. We added a verification step to ensure our backups were usable:
- name: Verify backup integrity
run: |
DATE=$(date +%Y-%m-%d)
# Create a temporary directory for verification
mkdir -p ./verify
# Extract backup to verify its contents
tar -xzf "$DATE.tar.gz" -C ./verify
# Check if extraction was successful
if [ $? -ne 0 ]; then
echo "Backup verification failed: Could not extract archive"
exit 1
fi
# Count collections to ensure we have data
COLLECTIONS=$(find ./verify/$DATE -name "*.bson" | wc -l)
echo "Backup contains $COLLECTIONS collections"
if [ $COLLECTIONS -eq 0 ]; then
echo "Backup verification failed: No collections found"
exit 1
fi
# List some collection sizes
find ./verify/$DATE -name "*.bson" -exec ls -lh {} ; | sort -rh | head -5
echo "Backup verification completed successfully"Adding Restore Functionality
With backups working reliably, we created a separate workflow for restoring data when needed. This workflow would be manually triggered with parameters to specify which backup to restore:
# .github/workflows/mongodb-restore.yml
name: MongoDB Restore
on:
workflow_dispatch:
inputs:
backup_date:
description: 'Backup date to restore (YYYY-MM-DD)'
required: true
target_database:
description: 'Target database name'
required: true
collections:
description: 'Specific collections to restore (comma-separated, leave empty for all)'
required: false
jobs:
restore:
runs-on: self-hosted
steps:
- name: Install MongoDB tools
run: |
# MongoDB tools installation script
- name: Download backup from S3
env:
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
AWS_REGION: 'us-east-1'
run: |
aws s3 cp "s3://my-backups/mongodb/${{ github.event.inputs.backup_date }}.tar.gz" ./
tar -xzf "${{ github.event.inputs.backup_date }}.tar.gz"
- name: Restore database
env:
MONGO_URI: ${{ secrets.MONGO_URI }}
run: |
if [ -z "${{ github.event.inputs.collections }}" ]; then
# Restore entire database
mongorestore --uri="$MONGO_URI" --nsInclude="${{ github.event.inputs.target_database }}.*" --drop "./${{ github.event.inputs.backup_date }}"
else
# Restore specific collections
IFS=',' read -ra COLLECTIONS <<< "${{ github.event.inputs.collections }}"
for collection in "${COLLECTIONS[@]}"; do
echo "Restoring collection: $collection"
mongorestore --uri="$MONGO_URI" --nsInclude="${{ github.event.inputs.target_database }}.$collection" --drop "./${{ github.event.inputs.backup_date }}"
done
fi
- name: Cleanup
run: |
rm -rf "${{ github.event.inputs.backup_date }}" "${{ github.event.inputs.backup_date }}.tar.gz"This restore workflow allowed us to quickly recover from data issues by selecting a backup date and optionally specifying which collections to restore.
Improved Security with OIDC
As a final security improvement, we replaced static AWS credentials with OpenID Connect (OIDC) to obtain temporary credentials. This eliminated the need to store long-lived AWS access keys in GitHub Secrets.
First, we set up the AWS IAM Identity Provider and role:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::123456789012:oidc-provider/token.actions.githubusercontent.com"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"token.actions.githubusercontent.com:aud": "sts.amazonaws.com",
"token.actions.githubusercontent.com:sub": "repo:our-org/our-repo:ref:refs/heads/main"
}
}
}
]
}Then we updated our workflow to use OIDC instead of static credentials:
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v1
with:
role-to-assume: arn:aws:iam::123456789012:role/github-actions-mongodb-backup
aws-region: us-east-1This change eliminated a significant security risk by removing long-lived credentials from our GitHub Secrets.
Adding Monitoring and Notifications
To complete our backup system, we added comprehensive monitoring and notifications:
- name: Send notification
if: always() # Run even if previous steps failed
uses: slackapi/slack-github-action@v1
with:
payload: |
{
"text": "MongoDB Backup ${{ job.status }}",
"blocks": [
{
"type": "section",
"text": {
"type": "mrkdwn",
"text": "*MongoDB Backup ${{ job.status }}*\nRepository: ${{ github.repository }}\nWorkflow: MongoDB Backup\nDate: $(date +%Y-%m-%d)"
}
},
{
"type": "section",
"text": {
"type": "mrkdwn",
"text": "Backup Size: ${{ env.BACKUP_SIZE }}\nBackup Location: s3://my-backups/mongodb/$(date +%Y-%m-%d).tar.gz"
}
},
{
"type": "actions",
"elements": [
{
"type": "button",
"text": {
"type": "plain_text",
"text": "View Workflow Run"
},
"url": "https://github.com/${{ github.repository }}/actions/runs/${{ github.run_id }}"
}
]
}
]
}
env:
SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }}
SLACK_WEBHOOK_TYPE: INCOMING_WEBHOOK
BACKUP_SIZE: ${{ env.BACKUP_SIZE }}This notification would be sent to Slack after each backup, whether successful or failed, allowing us to quickly identify and troubleshoot any issues.
Testing the Disaster Recovery Process
With our backup and restore workflows in place, we scheduled regular disaster recovery tests to ensure our team could quickly respond to a real emergency. Every quarter, we:
- Spin up a test MongoDB instance
- Restore a recent backup to the test instance
- Verify data integrity and test application functionality
- Practice the entire recovery procedure with an engineer who hadn't performed it before
- Document any issues or improvements
These tests proved invaluable when we did face a real data loss situation six months later. Our team was able to restore from backups with confidence and minimal downtime.
Lessons Learned
Implementing this MongoDB backup solution with GitHub Actions taught us several important lessons:
- Credential management is critical: Proper escaping of special characters in connection strings and secure handling of credentials can prevent numerous headaches.
- Network security requires planning: Consider network access restrictions early in your design process, not as an afterthought.
- Verify backups proactively: An unverified backup is potentially useless. Always include verification steps.
- Create backup hierarchies: Having different retention policies for daily, weekly, monthly, and yearly backups provides flexibility for recovery options.
- Test restore procedures regularly: The real test of a backup system is restoration. Practice it regularly.
- Document everything: When disaster strikes, clear documentation makes recovery faster and less stressful.
Conclusion
Using GitHub Actions for MongoDB backups has been a game-changer for our team. We went from a manual, error-prone process to a fully automated, monitored, and tested backup system that has already proven its value in real-world recovery scenarios.
The combination of scheduled workflows, self-hosted runners, secure credential management, and comprehensive verification provides peace of mind that our data is protected. The added benefits of version-controlled backup configurations and integration with our existing CI/CD system make GitHub Actions an excellent choice for database backups.
If your team is already using GitHub, I highly recommend leveraging GitHub Actions for critical operational tasks like database backups. The investment in setting up proper automation will pay dividends the first time you need to recover from a data issue—and you will eventually need to recover, it's just a matter of when.