Advanced Configuration 
1. Introduction
This document explores the additional optional inputs given opened up to the controlplane. See the Prerequisites and Quickstart for the initial setup guides.
Note
Hopefully this page can stand by itself, but feel free to read through the Architecture to see this from a wider context.
2. Fine-Tuning Runner Lifecycle and Resource Management
Understanding Runner Timeouts
Several inputs in the refresh
mode control how long runners live and how often their configurations. Finding the right balance is crucial:
Strategy
Start with the defaults. If you notice frequent cold starts and your budget allows, consider increasing idle-time-sec
. If cost is a primary concern, shorten idle-time-sec
but monitor workflow times. Ensure max-runtime-min
accommodates your longest jobs.
idle-time-sec
: Lifetime in Resource Pool 
Defined at the refresh
level with a default of 300s/5m. This defines how long the instance lives in the resource pool for following workflows to pickup! If in the pool for too long, the instance will be terminated by refresh
or will undergo self-termination.
- name: Refresh Mode
uses: fleet-actions/ec2-runner-pool-manager@main
with:
mode: refresh
# idle-time-sec: 300 # <---- (Default: 300 seconds)
idle-time-sec
- Shorter or Longer 
- Shorter: Reduces costs by terminating idle instances sooner. However, it might lead to more "cold starts" if subsequent jobs arrive just after an instance terminated, increasing wait times.
- Longer: Increases runner availability and reduces cold starts, but can lead to higher costs due to instances sitting idle for longer.
- Align this with your typical job arrival patterns. If jobs are infrequent, a shorter idle time might be better. If jobs are frequent, a longer idle time can improve CI/CD pipeline speed.
max-runtime-min
: Expected Max Job Duration 
Defined at the refresh
level (default: 30 minutes) and overridable at the provision
level, this parameter sets the maximum active duration for an instance between its assignment to a workflow and its release back to the pool.
It acts as a critical safeguard, enabling the control plane to safely terminate instances. This prevents them from being stranded due to misconfigured release
jobs or other unforeseen issues that would otherwise leave resources unreleased.
- name: Refresh Mode
uses: fleet-actions/ec2-runner-pool-manager@main
with:
mode: refresh
# max-runtime-min: 30 # <---- (Default: 30m)
Overriding at the provision
level. The operator can set some expectations for the controlplane to say how long a specific workflow can take from provision
to release
and control this timeouts at a workflow level instead of at the repo level.
- name: Provision Mode
uses: fleet-actions/ec2-runner-pool-manager@main
with:
mode: provision
# max-runtime-min: 30 # <---- (overrides refresh input if provided)
max-runtime-min
- Shorter or Longer 
- Recommendation Be conservative. Whatever your longest running job is at a workflow, add 10-15 minutes to it to avoid pre-mature termination.
- This can be set at either the
refresh
orprovision
level to set per-workflow expectations.
Instance Purchasing Options (provision
mode)
The provision
mode offers ways to control the type and cost of EC2 instances provisioned for specific workflows.
usage-class
: On-Demand vs Spot
You can specify whether to provision on-demand
or spot
instances at the workflow level using the usage-class
input in provision
mode. The default is on-demand
. It's generally recommended to use on-demand
instances for critical workflows and spot
instances for non-critical CI tasks.
- name: Provision Mode
uses: fleet-actions/ec2-runner-pool-manager@main
with:
mode: provision
# usage-class: on-demand # <---- (Default: on-demand)
The Right Match: On-Demand vs. Spot
The controlplane is able to discriminate between instances which have on-demand
or spot
lifecycles. If none available, the contolplane creates specific instances faithful to the usage-class
constraint.
allowed-instance-types
: Instance types by wildcards
This input accepts a space-separated list of EC2 instance types (e.g., m5.large c6i* r*
) or family wildcards (e.g., c*
, m*
, r*
) that the workflow should use. Instances matching these patterns will be selected from the existing resource pool or provisioned if new ones are needed. This capability aligns with the AWS AllowedInstanceTypes specification.
The default is c* m* r*
. This setting allows the control plane to choose from a wide array of instance types.
- name: Provision Mode
uses: fleet-actions/ec2-runner-pool-manager@main
with:
mode: provision
# allowed-instance-types: "c* m* r*" # <---- (Default: "c* m* r*")
Be generous with usage-class: spot
Telling the controlplane to use spot instances for the workflow? Cast your net wide! Defer to the generous c* m* r*
defaults to ensure that AWS always has capacities for your instances when your workflow asks for these instances.
The Right Match: allowed-instance-types
Similar to usage-class
, the controlplane is able to discriminate instances from the shared resource pool given the patterns and instance types specified in allowed-instance-types
.
3. Some AMI and pre-runner-script
strategies
Bake as much in to the AMI image as you can to ensure timely startup times. As per Quickstart, feel free to use Runs-On's machine images to not worry about any of this. However if you do decide to roll your own, there are a couple of hard requirements to ensure proper functionality:
-
git
&docker
: Checking out code and running containers -
aws cli
: Controlplane communication via dynamodb -
libicu
: Configuring instances against github actions as a self-hosted runner
pre-runner-script
Here are some recommended scripts when using various bare AMI images.
Refining recommended scripts
If any of these scripts do not work to initialize the runners, feel free to raise a pull request! It would be much appreciated ~ 🙏
Amazon Linux 2023
- name: Provision Mode
uses: fleet-actions/ec2-runner-pool-manager@main
with:
mode: provision
ami: ami-123 # <--- AL2023 image
pre-runner-script: |
#!/bin/bash
sudo yum update -y && \
sudo yum install docker -y && \
sudo yum install git -y && \
sudo yum install libicu -y && \
sudo systemctl enable docker
Ubuntu 24
- name: Provision Mode
uses: fleet-actions/ec2-runner-pool-manager@main
with:
mode: provision
ami: ami-123 # <--- Ubuntu 24
pre-runner-script: |
#!/bin/bash
sudo apt update && sudo apt upgrade -y
sudo apt install -y docker.io git libicu-dev unzip curl
# AWS CLI v2 --> x86_64
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
sudo ./aws/install
rm -rf awscliv2.zip aws
sudo systemctl enable docker
sudo systemctl start docker
4. Resource Classes for Varied Workloads
The operator can control the size of the instances at the level of the workflow (ie. provision
) - with sizes defined at the level of the repo (ie. refresh
). The former is specifies the resource class of the instance for the workflow (via resource-class
), and the latter specifies the valid resource classes (via resource-class-config
).
Pre-Defined Resource Classes & Usage
The controlplane provides a comprehensive set of resource classes that can be used at a workflow level. At the time of this writing. the defaults are the following:
Resource Class | CPU (cores) | Minimum Memory (MB) |
---|---|---|
large | 2 | 4,096 |
xlarge | 4 | 8,192 |
2xlarge | 8 | 16,384 |
4xlarge | 16 | 32,768 |
8xlarge | 32 | 65,536 |
12xlarge | 48 | 98,304 |
16xlarge | 64 | 131,072 |
And to use these pre-defined resource classes, simply specify the resource-class
attribute at the provision
level
- name: Provision Mode
uses: fleet-actions/ec2-runner-pool-manager@main
with:
mode: provision
resource-class: "xlarge" # <---- (Default: "large")
Greater control with allowed-instance-types
Say that for a specific workflow, you have specified the following:
- name: Provision Mode
uses: fleet-actions/ec2-runner-pool-manager@main
with:
mode: provision
resource-class: "xlarge"
allowed-instance-types: "c6* m5*"
xlarge
given that their instance types match with the c6
and m5
families. If none available from the pool, then only instances which fulfill these requirements will be provisioned.
The Right Match: resource-class
Similar to usage-class
and allowed-instance-types
- the controlplane is able to discriminate instances from the shared resource pool given the specified resource-class
.
Keeping consistent with AWS 📚
- Naming Convention: Resource class names align with AWS EC2 instance type naming (e.g.,
large
= 2 CPU cores,xlarge
= 4 CPU cores, etc.) - Memory Allocation: Memory values represent minimum requirements and are set at approximately 2GB per CPU core. This ensures compatibility across the mainstream instance families:
- Compute-optimized instances (
c
family): ~2GB per core - General-purpose instances (
m
family): ~4GB per core - Memory-optimized instances (
r
family): ~8GB per core
- Compute-optimized instances (
Custom Resource Classes
If you have custom requiremetns - here's an example. Note that this this overrides the pre-defined resource classes.
# in refresh.yml
- name: Refresh Mode
uses: fleet-actions/ec2-runner-pool-manager@main
with:
mode: refresh
resource-class-config: '{ "custom-large": { "cpu": 4, "mmem": 8000 }, "custom-extra-large": { "cpu": 8, "mmem": 16000 } }'
# in ci.yml
- name: Provision Mode
uses: fleet-actions/ec2-runner-pool-manager@main
with:
mode: provision
resource-class: "custom-large"
5. Permission & Network Hardening
Beyond the basic IAM policies in Prerequisites, feel free to further harden IAM Policies and the Networking around the provided subnets.
IAM Least Privilege
IAM Entity for the ControlPlane
- When configuring credentials with
aws-actions/configure-aws-credentials
, it is recommended to use OIDC to prevent the usage of permanent credentials within CI. -
iam:PassRole
- Once you know the exact role the controlplane is passing on to the instances, I highly recommend adding its ARN to the -
SQS & DynamoDB - These resources are created with a repo-specific prefix which we can use to harden the IAM policy further
fleet-actions-ec2-runner-pool-manager
IAM Entity for the EC2 Instance Profile
At a minimum, the role given to the instance needs to be able to self-terminate and read-write from a ddb table. As such, we can explore two avenues of hardening the role handed to the ec2 instance beyond the minimum defined in the Prerequisites:
- Self-Termination: Can use Self referencial ARNs to ensure that
aws ec2 terminate-instances
can only be really delivered to the instance that calls it. - Read & Write DynamoDB Table: As above, restrict the resource to an arn that follows the repo-owner and repo-name
"arn:aws:dynamodb:*:*:table/{repo-owner}-{repo-name}-*"
Interacting with other AWS Services ☁️
If your self-hosted runner needs to communicate with other AWS services (ie. s3
), feel free to expand the EC2 instance profile - but always ensure that the minimum permissions specified here are included.
Add AmazonSSMManagedInstanceCore
- connect to your self-hosted runners ⭐
Session Manager is my favourite way to connect to instances as they do not require bastion hosts or managing SSH keys. I recommend expanding your EC2 instance role with AWS Managed Policy AmazonSSMManagedInstanceCore. When used with Machine Images built on Amazon Linux 2023 and Ubuntu - Session Manager should work out of the box as the SSM Agent is pre-installed 🤩
Network Security
Security Groups
Only allow outbound traffic necessary. Outbound traffic can be further restricted as per Github's prescription self-hosted runner communication.
Subnets and VPC Endpoints
We recommend adding VPC endpoints to your VPC's route tables if you prefer to keep AWS service calls off the public internet. The self-hosted runners already make calls to DynamoDB in the background so a Gateway Endpoints will bring those network costs to zero. Furthermore, if you need to pull container images from ECR or access artifacts in S3, consider adding the S3 Gateway Endpoint as well.