Motivation

It is service world and there are several replicas of the same code executing in several hosts. There might be production errors that seem like a deploy might have not gone thru in some of them. When this happens, someone needs to manually check the file versions in all hosts. This might be simple when there 2 or 3, but there might be a lot more. This sounds like a task for a computer.
whale
Now, with continuous integration and delivery I see a way we can reduce the probability of inconsistencies and detect them way before they become a problem so we can raise all necessary alarms.

TL;DR

Using your CI build tool inject the build number into the Dockerfile and make visible to the application via environment variable.
Build a service endpoint returning that environment variable.
Deploy to a staging environment.
Validate the build number on staging environment
Deploy to prod.
Validate the build number on prod.

My toys

Long story

I will show you how I solved this problem and now they are part of the standard issue for all of my microservices.

The image…and likeness

Lets say you build your image

FROM microsoft/dotnet

ARG BUILD_NUMBER

# Dotnet build steps

# Next line is the magic step
ENV Meta__BuildNumber ${BUILD_NUMBER}

ENTRYPOINT ["dotnet", "Service.dll"]

The environment variable naming convention is dotnet core configuration’s way. With this magic step we convert the build argument into an environment variable that will be available in execution.

Jenkins: The build

~~stage('build') {
	sh(script: """
		docker build . \\
			--file pipelines/Build.Dockerfile \\
			--build-arg BUILD_NUMBER=$BUILD_NUMBER
	""")
}~~

Here we pass the argument from the build tool to Docker.

The Service…and longer part

Make sure environment variables are added to your configuration.

	configurationBuider
		.SetBasePath(env.ContentRootPath)
		.AddJsonFile("appsettings.json")
		.AddJsonFile($"appsettings.{env.EnvironmentName}.json")
		.AddEnvironmentVariables()
		.AddCommandLine(args)

Build a Meta (or any other name) class. An instance of this class will hold the actual value.

public class Meta 
{
	public int BuildNumber { get; set; }
}

Wire it up to dependency injection.

	services
		.Configure<Meta>(configuration.GetSection("Meta"))
		.AddTransient(r => r.GetRequiredService<IOptions<Meta>>().Value);

Create the service endpoint.

[Route("diagnostics")]
public class DiagnosticsController: Controller 
{
	private readonly Meta meta;
	
	public DiagnosticsController(Meta meta) 
	{
		this.meta = meta;
	}

	public IActionResult Get() 
	{
		return Ok(meta);
	}
}

I wanna see it dad!!

|--> curl staging.mycompany.com/diagnostics
{ 
	buildNumber: 209 
}

The Validator

Build an integration test accessing the endpoint

public class Integration 
{
	private readonly IConfiguration configuration;
	private readonly string serviceUrl;
	private readonly int expectedBuildNumber;
	private readonly HttpClient http;
	
	public Integration() 
	{
		var currentDirectory = Directory,GetCurrentDirectory();
		configuration = new ConfigurationBuilder()
			.SetBasePath(currentDirectory)
			.AddJsonFile("appsettings.json")
			.AddEnvironmentVariables()
			.Build();
			
		serviceUrl = configuration.GetValue<string>("Service:Url");
		expectedBuildNumber = configuration.GetValue<int>("Meta:BuildNumber");
		http = new HttpClient();
	}

	[Fact]
	[Trait("Category", "Integration")]
	public async Task Service_build_number_is_correct()
	{
		var response = await http.GetAsync($"{serviceUrl}/diagnostics");
		response.EnsureSuccessStatusCode();
		var meta = response.Content.FromJsonBytes<Meta>();

		Assert.Equal(expectedBuildNumber, meta.BuildNumber);
	}
}

This test requires 2 configuration values, Service:Url and Meta:BuildNumber. Both must be available when running the tests.

Deploy to staging

At some point the pipeline will deploy to a staging environment. Just sit tight and wait for it to finish. Give it some room just in case application data loading or rolling updates are needed.

Jenkins II: The validation

Let’s suppose we have build an image for running tests and tagged it Tests:$BUILD_NUMBER. With that image we can execute the validation. Time for the shit. Run the funky tests wild boy.

stage('validate staging') {
	sh(script: """
		docker run \\
			--env Service__Url=http://staging.mycompany.com \\
			--env Meta__BuildNumber=$BUILD_NUMBER \\
			Tests:$BUILD_NUMBER \\
			dotnet test --filter=Category~Integration
	""")
}

At this point, if this step comes out green, you are very likely to deploy to production without any issue. If it fails you might have broken staging, but not production. You are still save. With the build logs together with your environment ones you are very likely to find the issue, solve and start the pipeline straight from the top.

The only truth

If you have a good staging environment, one that is almost identical to prod, the only acceptable differences are sizes and identities, your chances of getting a successful deployment is extremely high.

In case the validation fails: man, you are in trouble. You might have broken prod, fixing it must be the highest prio. At least, with this validation you will be able to tell immediately. Make sure you find the root cause of the problem. Don’t just fix. Gather all logs, reproduce the issue, automate it.

In the end, the only thing that matters is prod. It’s what pays for the party.

QC your whale