Exceptions vs Error Codes

A question that frequently arises in software engineering is whether/when you should throw exceptions or return error codes.

There are many different opinions about this subject, and there’s a large number of C#/Java developers who throw Exceptions to indicate all kinds of errors. However, there’s also a large number of developers who believe that Exceptions should not be used as control flow and argue that exceptions act like non-local GOTO statements but even more evil because it’s difficult to know where the control goes to (some people compare exceptions to invisible gotos or gotos which lead to invisible labels).

I tend to agree with the second group: I think exceptions are for unexpected and unhandleable situations - when there’s nothing I can do with the error code, and the only expected action is to bubble-up the error in the stack and let the upper levels handle it (to display the error, maybe offer a retry mechanism). This means that for all kinds of “expected errors” I expect them to be returned by my methods, and treated (or intentionally ignored) by the caller method.

GO Language: Panic vs Errors, and Explicit Error Checking

When talking about Exceptions vs Error Codes I usually mention the GO language which has some clear guidelines about error handling.

One of their design principles is that they have “panic” for fatal unexpected situations (which is pretty much like Exceptions in Java/C#) and they also have “Errors” (any object that you return which implements Error interface) which should be used for regular expected situations. This is pretty much like the distinction which I explained above about when I like to use exceptions and when I like to use error codes.

This means that the language encourages you to explicitly check for errors where they occur, as opposed to the paradigm that expects you to throw (and catch) exceptions even for expected errors. Since the language has this clear distinction between exceptions and return errors, it has some conventions and constructs for error handling that allows developers to easily (and concisely) get and test errors.

Basically, all functions where errors are expected to happen should always return an error object, and if the function is also expected to return some other object (in case of success) then it should use multiple return values so that it can simultaneously return both the expected result objects (at the first position) and then return the possible errors (at the last position), like this:

file, err := os.Open("filename.txt")
if err != nil {
    // abort the program...
}
fmt.Print(string(file))

This multiple return values feature makes the code more concise and it’s now also available in C# 7. And by receiving the error as a regular return value our error handling becomes less verbose, more explicit, and easier to read since it uses regular if/else statements.

Another nice convention is that when returning multiple parameters the return should always be ONE OR ANOTHER, meaning that if there’s an error you can expect that the other object is null, and vice-versa. A simple convention that makes error handling even easier.

Last, another major benefit of explicitly returning error codes is that exceptions can easily be ignored while it’s much harder to ignore errors if your methods force you to receive the returned error. Returning codes obviously don’t allow us to bubble-up automatically (as exceptions do), but the idea is exactly that - you want to check (and act) on your errors right where they occur.

Returning Errors as Enums instead of Classes

As I’ve explained above, errors are about regular expected situations where the caller code should know the possible results and should decide how to handle each possible error. If what you get is a class instance (either as a thrown Exception or as a regular returned object) you really don’t know what kind of errors you might receive.

The first problem about using exceptions is that the caller code (or compiler) doesn’t know what exceptions each method may throw. Java has checked exceptions but all other languages learned that checked exceptions are evil and don’t even have that alternative. Obviously you could describe in your documentation instructions like “This CreateUser() method may throw an exception of CreateUserException”, and then the caller code would know what to expect, but what’s the point if you could just define what your method returns in the return type?

The second problem about returning classes (either as exceptions or return objects) is that developers usually design the different errors as subtypes (subtype polymorphism), and the caller code (or compiler) can’t know/test for all possible errors.

try
{
    // CreateUserCommand xmldoc will warn me that it may throw a CreateUserCommandException
    User createdUser = CreateUserCommand(newUserInfo);
}
catch (CreateUserCommandException)
{
    // How many possibilities do we have here?
    // Which subtypes should we check?
}

One alternative for this second problem is adding an enum CreateUserCommandErrorEnum inside CreateUserCommandException. Then we could use if/switch statements, and the compiler can check if we cover all possible errors. But that looks too complex (both an exception and an enum for each possible method) and without any benefit - in the end it’s all about Enums, since there are a predefined number of possible outcomes.

To sum, I prefer returning enums directly (instead of throwing exceptions or returning error classes) because:

  • The caller code is forced to receive the error, less prone to ignore the possibility of an error
  • The caller code will know in advance all possible errors
  • We can treat the possible errors with switch statements and we can easily ensure that all possible returns are handled.
  • If I add a new possible return in my method, I can even check all callers if they are covering that new value in a switch statement for example.
  • I believe that we should explicitly test for expected errors right where they occur.

Given the reasons above, for the rest of this post I’ll assume that your methods are returning errors as enum, like this one:

public enum CreateUserCommandError
{
    USERNAME_NOT_AVAILABLE,
    WEAK_PASSWORD,
}

In the next sections I’ll show a few different ways of returning multiple parameters in C#.

Using OUT parameters

Using out parameters is as simple as this:

public User CreateUserCommand(UserDTO newUserInfo, out CreateUserCommandError? error)
{
    if (somethingBad)
    {
        error = CreateUserCommandError.USERNAME_NOT_AVAILABLE;
        return null;
    }
    // ...
    error = null; // out parameters need to be assigned even if null
    return user;
}

CreateUserCommandError? error = null;
User createdUser = CreateUserCommand(newUserInfo, out error);
if (error != null)
{
    // early abort..
}
LoginUser(createdUser);

In the new C# 7 we don’t even have to define the out variables anymore, we can use implicitly typed local variable (out var):

User createdUser = CreateUserCommand(newUserInfo, out var error);

Using regular Tuples

Using Tuples is a little more verbose, but returns all parameters in a single Tuple object:

public Tuple<User, CreateUserCommandError?> CreateUserCommand(UserDTO newUserInfo)
{
    if (somethingBad)
        return new Tuple<User, CreateUserCommandError?>(null, CreateUserCommandError.USERNAME_NOT_AVAILABLE);
    // ...
    return new Tuple<User, CreateUserCommandError?>(user, null);
}

var result = CreateUserCommand(newUserInfo);
if (result.Item2 != null) // Item2 is the Error
{
    // early abort..
}
LoginUser(result.Item1); // Item1 is the User returned

Using the new ValueTuple

In the new C# 7 there’s this new ValueTuple struct, where we can give more meaningful names to the tuple members, and we can also use a simplified syntax both for creating new ValueTuple and for deconstructing the ValueTuple:

public (User createdUser, CreateUserCommandError? error) CreateUserCommand(UserDTO newUserInfo)
{
    if (somethingBad)
        return (null, CreateUserCommandError.USERNAME_NOT_AVAILABLE);
    // ...
    return (user, null);
}

var result = CreateUserCommand(newUserInfo);
// The names "error" and "createdUser" come directly from the method signature above
if (result.error != null)
{
    // early abort..
}
LoginUser(result.createdUser);

We can deconstruct the ValueTuple in a single statement which can both declare the variables and assign values to them:

var (user, error) = CreateUserCommand(newUserInfo);
// or: (var user, var error) = CreateUserCommand(newUserInfo);
// or: (User user, CreateUserCommandError? error) = CreateUserCommand(newUserInfo);

if (error != null) // Error
{
    // early abort..
}
LoginUser(user);

Using Generics

Another popular method is to Wrap your returns in a generic class which wraps both your return object and the possible error (actually it should return ONE OR ANOTHER, but not both).

public class CommandResult<TEntity, TError>
        where TEntity : class
        where TError : struct, Enum
{
    public TEntity Entity { get; set; }
    public TError? Error { get; set; }
    public bool IsSuccess => (Error == null);

    // Many developers also include a "Message" property
    // which usually can be both the success message or the error description
    // public string Message { get; set; }

    public static CommandResult<TEntity, TError> Success(TEntity entity)
    {
        return new CommandResult<TEntity, TError>() { Entity = entity };
    }

    public static CommandResult<TEntity, TError> Fail(TError errorCode)
    {
        return new CommandResult<TEntity, TError>() { Error = errorCode };
    }
}

public CommandResult<User, CreateUserCommandError> CreateUserCommand(UserDTO newUserInfo)
{
    if (somethingBad)
        return CommandResult<User, CreateUserCommandError>.Fail(CreateUserCommandError.USERNAME_NOT_AVAILABLE);
    // ...
    return CommandResult<User, CreateUserCommandError>.Success(user);
}

var result = CreateUserCommand(newUserInfo);
if (result.Error != null) // Error
{
    // early abort..
}
LoginUser(result.Entity);

Using generics is more verbose than the ValueTuple syntax but it’s much more powerful since we can enhance the class with extra information. Later I’ll show how to enhance this class with more information.

But this verbose syntax makes me a little annoyed, so let’s try to get the best of both solutions…

Combining Generics with the concise ValueTuple Syntax

There are a few tricks that we can use to make the Generics version more friendly and less verbose.

First, inside the CommandResult<> class we can create an implicit conversion operator that will convert (at the compiler level) a ValueTuple to a CommandResult<>:

public static implicit operator CommandResult<TEntity, TError>(ValueTuple<TEntity, TError?> tuple)
{
    if (tuple.Item1 != null && tuple.Item2 == null)
        return Success(tuple.Item1);
    if (tuple.Item1 == null && tuple.Item2 != null)
        return Fail(tuple.Item2.Value);
    throw new NotImplementedException("When error is returned you cannot return any other value together");
}

Then we can return our results as if they were ValueTuples:

//return CommandResult<User, CreateUserCommandError>.Fail(CreateUserCommandError.USERNAME_NOT_AVAILABLE);
return (null, CreateUserCommandError.USERNAME_NOT_AVAILABLE);

//return CommandResult<User, CreateUserCommandError>.Success(user);
return (user, null);

Inside the CommandResult<> class we can also create a deconstruct method so that CommandResult<> can be deconstructed into its two parts:

public void Deconstruct(out TEntity entity, out TError error) => (entity, error) = (this.Entity, this.Error);

This mean that we can deconstruct (declare the variables deconstruct the different parts) in a single call like this:

var (user, error) = CreateUserCommand(newUserInfo);
if (error != null) // Error
{
    // early abort..
}
LoginUser(user);

So cool and so easy, isn’t it?

Enhancing the Error enum

In all previous examples the returned error was only an enum. In the last example above (Generics with concise syntax) the enum was part of the Generic class:

public class CommandResult<TEntity, TError>
        where TEntity : class
        where TError : struct, Enum
{
    public TEntity Entity { get; set; }
    public TError? Error { get; set; }
    public ErrorResult<TError> Error { get; set; }
    public bool IsSuccess => (Error == null);
    // ...
}

But we can enhance the error object (TError?) with more information by wrapping in inside another class:

public class ErrorResult<TError>
    where TError : struct, Enum
{
    /// <summary>
    /// This MAY or may not be defined (even if an error happened!).
    /// If this is null, you should check the <see cref="ValidationErrors"/> to see why the command failed.
    /// </summary>
    public TError? ErrorCode { get; set; }
    public string ErrorMessage { get; set; }
    public IList<ValidationError> ValidationErrors;
}
public class ValidationError
{
    public string PropertyName { get; set; }
    public string ErrorMessage { get; set; }
}

public class CommandResult<TEntity, TError>
        where TEntity : class
        where TError : struct, Enum
{
    public TEntity Entity { get; set; }
    public ErrorResult<TError> Error { get; set; } // THIS!
    public bool IsSuccess => (Error == null);
    // ...
}

This design gives us a few advantages over a simple enum:

  • We can add a descriptive string ErrorMessage
  • We can add a list of ValidationErrors which can be helpful to display in the UI
  • We don’t need to define enums for all possible situations - the caller code can just assume that if CommandResult.Error is non-null then some error happened, and it can use the ErrorMessage and/or ValidationErrors. It only needs to test the TError? ErrorResult.ErrorCode if it needs to handle specific cases - else it can just have this “general” error handling.
  • ValidationErrors could be filled automatically. In my next post, I’ll show how this can be done using MediatR pipeline (behaviors) to check for validation errors using FluentValidation and automatically fill CommandResult.ValidationErrors, without even hitting our Command Handlers.

Well, at the beginning of this post I wrote that I would return simple enums, and in the end I’m returning classes. But basically, this is just a thin wrapper to enhance the enum with additional features - but it doesn’t invalidate the downsides that I’ve mentioned earlier (about type polymorphism uncertainty, and about how throwing exceptions is bad when you need to handle the outcomes).

Last, this wrapper around ErrorCode enum shouldn’t stop us from doing direct comparisons. We can still compare the error with the enum values as long as we overload the equality operators like this:

public static bool operator ==(ErrorResult<TError> left, TError right)
{
    return left.ErrorCode != null && left.ErrorCode.Value.Equals(right)
}

var (user, error) = CreateUserCommand(newUserInfo);
if (error == CreateUserCommandError.USERNAME_NOT_AVAILABLE) { ... }
else if (error == CreateUserCommandError.WEAK_PASSWORD) { ... }
else if (error == null)
{
   // ...
}

You can find full source code here.

My client had this issue where their web application (deployed across multiple servers) was randomly making the servers unresponsive with 100% cpu usage.

The first action we took was to configure the IIS to automatically recycle the Application Pools when they are using high CPU for more than a few minutes. In the example below we kill AppPools after 3 minutes of using more than 80% cpu.

dir IIS:\AppPools  | ForEach-Object {
	Write-Host  "Updating $($_.Name) ..."

	$appPoolName = $_.Name
	$appPool = Get-Item "IIS:\AppPools\$appPoolName"
	$appPool.cpu.limit = 80000
	$appPool.cpu.action = "KillW3wp"
	$appPool.cpu.resetInterval = "00:03:00"
	$appPool | Set-Item
}

That solved the problems, servers stopped getting unresponsive, but we had to investigate what was eating all CPU.

See below how I proceeded with the troubleshooting:

1. Create a Memory Dump

Task Manager - Right Button in the IIS Worker Process, and create a Dump File

2. Install Debug Diagnostic Tool

Download and install Debug Diagnostic Tool

3. Run Crash & Hang Analysis for ASP.NET / IIS

Add your dump (DMP) file, select “CrashHangAnalysis”, and click “Start Analysis”.

4. Review Analysis for Problems

The first page immediately suggests that there’s a Generic Dictionary which is being used by multiple threads and is blocking one thread.

A few pages later we can find the threads which are consuming the most of the CPU:

If we check those top threads we can see that both are blocked in the same call which is invoking GetVersion() on an API client-wrapper. One thread is trying to Insert on the dictionary (cache the API version), while the other is trying to Find (FindEntry) on the dictionary.

5. What was the issue?

Long Explanation:
Dictionary<T> is a HashMap implementation, and like most HashMap implementations it internally uses LinkedLists (to store multiple elements in case different keys result into the same bucket position after being hashed and after taking the hash modulo). The problem is that since Dictionary<T> is not thread-safe, multiple threads trying to change the dictionary may put it into an invalid state (race condition).

Probably there were different threads trying to add the same element to the dictionary at the same time (invoking Insert method which internally invokes the Resize method which modifies the LinkedList), which was putting the LinkedList (and therefore the whole HashMap) into an inconsistent state. If the LinkedList goes into an inconsistent state it can put the threads into an infinite loop, since both Insert() and FindEntry() iterate through the LinkedList and could go into an infinite loop if the LinkedList was inconsistent.

Short Explanation:
Since Dictionary<T> is not thread-safe, multiple threads trying to change the dictionary may put it into an invalid state (race condition). So if you want to share a dictionary across multiple threads you should use a ConcurrentDictionary<T> which as the name implies is a thread-safe class.

It’s a known-issue that concurrent access to Dictionary can cause an infinite-loop and high-CPU in IIS: Link 1, Link 2, Link 3, Link 4.

6. Advanced Troubleshooting using WinDbg

If the Debug Diagnostic Tool didn’t gave us any obvious clue about the root cause, we could use WinDbg to inspect a memory dump (it also supports .NET/CLR). See example here.

Windows Subsystem for Linux (WSL)

If you’re a Windows developer that needs to compile or run Linux binaries in Windows, then the Windows Subsystem for Linux (WSL) is for you. WSL is a tool for developers and sysadmins that need Linux interoperability in Windows.

The most important thing to understand about WSL is that it isn’t based on virtual machines or containers - when you download a supported Linux distro from the Microsoft Store, it doesn’t mean that a virtual machine is created. What WSL provides is just a layer for mapping Windows kernel system calls to Linux kernel system calls - this allows Linux binaries to run in Windows unmodified. WSL also maps Windows services, like the filesystem and networking, as devices that Linux can access.

Instead of using a VM or container, WSL virtualizes a Linux kernel interface on top of the Windows kernel. This means that running WSL only requires a minimal amount of RAM. And when not in use, the WSL driver isn’t loaded into memory, making it much more efficient than a solution based on a VM or container.

Installing WSL and Ubuntu in a Windows Server

Run this in a Powershell Administrator Prompt (you’ll have to reboot after this):

Enable-WindowsOptionalFeature -Online -FeatureName Microsoft-Windows-Subsystem-Linux

Run this in a Powershell Administrator Prompt to install Ubuntu 18.04:

curl.exe -L -o ubuntu-1804.appx https://aka.ms/wsl-ubuntu-1804
Rename-Item ubuntu-1804.appx ubuntu-1804.zip
Expand-Archive ubuntu-1804.zip ubuntu1804

cd ubuntu1804
.\ubuntu1804.exe 

You’ll be asked to choose a Linux username and password.

Installing Redis on Ubuntu (under Windows Server WSL)

Invoke a bash shell as superuser (for the next commands which require root):

sudo bash

(you’ll be asked for administrator’s password created earlier)

Update apt-get and install redis:

apt-get update && apt-get upgrade
apt-get install redis-server

Configuring Redis for external access (Optional)

If you’ll only use Redis in your internal protected network you don’t need this.

WSL only provides a translation layer between the Linux apps and the Windows kernel, so some core parts of the Ubuntu system (including networking) are just not there - WSL just translates the Linux system calls into windows ones so the Ubuntu network data flows through the exact same TCP/IP stack as the windows data.

This means that to open Redis server to other servers (or to the public internet) you just have to configure Redis to listen on the correct interfaces, and open the appropriate ports (there’s no need to do “port forwarding” since this is not a VM with its own networking interfaces).

By default Redis will only bind to loopback (localhost) interfaces. If you open /etc/redis/redis.conf (by running nano /etc/redis/redis.conf) you’ll find a line like bind 127.0.0.1 ::1 which means that Redis by default listens on ipv4 loopback (127.0.0.1) and ipv6 loopback (::1). Just change it to bind 0.0.0.0 ::1 to make Redis listen in all interfaces (including public IPs), or if it’s just for internal network you can add the internal ips in which this server should listen on.

And in case you’re exposing it through the internet you’ll also have to open your Windows Firewall ports:

netsh advfirewall firewall add rule name="Redis Port 6379" dir=in action=allow protocol=TCP localport=6379

Authentication

If you’re exposing your server to public internet you’ll have to configure a password, because by default Redis does not accept external connections without a password.
Yet in /etc/redis/redis.conf you just have to uncomment the line requirepass and set next to it a very-strong password (since it’s very fast to run a brute-force attack on a Redis server).

Start and test server

Start service:

sudo service redis-server restart

To test the service, run redis-cli and use the following commands:

AUTH yourstrongpassword  (if you have enabled authentication, above)
set user:1 "Oscar"
get user:1

To stop your server you can use:

sudo service redis-server stop

If you want to put your Windows Server to automatically start this Linux service on every boot you can configure this command to be executed on every boot:

C:\windows\system32\wsl.exe -u root service redis-server start

References: