So. Up until last week, Way of Rhea did not have a crash reporter. This is kind of a problem!

We’re a small, self-funded team. Our initial reviews could have a big impact on sales. The last thing we need is for an easily fixable crash to mess up our day one reviews!

I’ve tested the various aspects of the game on a large variety of setups, but there’s always a configuration you’ve yet to test. I’ve participated in lots of events with test builds, but I doubt most players will bother to report when a free demo crashes unless they’re already very invested, or reporting it is very easy.

So. Let’s make it easy!

Table of Contents
Scope & Goals
2023 Update
Implementation
Conclusion

Scope & Goals

I put off writing a crash reporter this long because it seemed difficult. We need to detect when a program crashes, and then if the user gives consent, send ourselves data about the crash.

This writeup will cover a basic but effective way of writing a crash reporter that I employed in Way of Rhea. It will not cover dumping the process’s memory, I’m assuming all you want is the user’s optional self report and a log file that you’ve already created.

I’m working primarily in Rust, but everything I’m doing here should be easily adaptable to any low level language. The UI implementation (which is written in C++) is Windows specific, and can be skipped if you’re working on another platform. All mentions of Steam can be safely ignored if you’re not shipping your software on Steam.

It’s unfortunate that crash reports aren’t something Steam can handle for me given that Valve is taking a 30% cut of my sales, but I guess there’s not much any of us can do about that.

2023 Update

Subtale Games released a Rust library based on the approach described in this post. I haven’t looked at it in detail, but if you’re working in Rust it’s worth checking out!

I’ve also made improvements to Way of Rhea’s crash handler since writing this post: I now use sigaction on Linux, and “structured exception handling” + MiniDumpWriteDump on Windows to gather more information in the event of a crash due to a signal. Signal handling is a big topic deserving of it’s own post, but if you’re reading this it means I haven’t found the time to write that post yet and you should look into these topics on your own!

Implementation

Detecting Crashes

In an exception based language, it’s tempting to catch all unhandled exceptions in main and call it a day. (In Rust, the closest equivalent would be catching all unwinding panics.) This isn’t great, though: it doesn’t catch segfaults, for example, and like many game developers I turn off unwinding errors.

If catching unwinding errors isn’t enough, then what do we do? We have options (signal handlers, GetExitCodeProcess, etc), but the simplest approach, and the one we’ll take, is just to make the main program a subprocess of the crash reporter. When the main program exits, the crash reporter can check its exit status.

In fact, through clever use of command line arguments, we can even bundle the crash reporter and main program into the same executable (thanks @andy_kelley). This trades off some reliability for convenience–we won’t be able to handle e.g. missing DLL errors since, but we also won’t have to deal with having two separate exes:

If --bypass-crash-handler is not specified, we run the crash handler
- This forwards all command line arguments to a subprocess at the current executable’s path with --bypass-crash-handler appended
- We wait for the subprocess to exit, and then check its exit status to see if it exited normally, or if it crashed
If --bypass-crash-handler is specified, we just run the program

Here’s how I call the subprocess from the crash handler:

let mut args = std::env::args().collect::<Vec<_>>();
        args.remove(0); // Remove the first argument
        args.push("--bypass-crash-handler".to_string());
        let result = Command::new(std::env::current_exe().unwrap())
            .args(args)
            .status();

One point of subtly–we have to remove the first argument, since it’s usually the path to the executable, and we don’t want to duplicate it. I actually can’t find any documentation guarenteeing there will always be a first argument, so this technically could fail, but clap depends on this behavior as well so it’s probably a pretty safe bet. If in the worst case scenario, if a user finds a strange way to omit this argument, I doubt they’ll be too surprised when the executable immediately crashes.

A note on the Steam API

Hold up. Will Steam even understand that the subprocess is really the game?

…

Thankfully, yes! The the Steam overlay appears to automatically show up over the subprocess even if it doesn’t explicitly initialize the Steam API.

(Phew, we wouldn’t want the Stem overlay to mess things up!)

However, you should probably initialize the Steam API if you’re shipping on Steam, if only for sapi_RestartAppIfNecessary which will ensure that if the game is launched e.g. by double clicking it outside of Steam, it restarts itself from within Steam.

You want to do this before deciding whether to boot the crash handler or the game.

Why? Otherwise, this will happen:

User double clicks game outside of Steam
The executable launches
It goes into crash handler mode
It spawns a subprocess of itself with --bypass-crash-handler and waits for that subprocess to exit
The subprocess starts up, and checks if it needs to restart in Steam
It does need to restart, so it immediately exits, ending itself and the parent crash handler
The game now starts from within Steam with --bypass-crash-handler, and no parent crash handler process

I caught this because, for whatever reason, Steam warns if the game is started with command line arguments from outside of Steam. Implemented correctly we won’t actually be doing that, but if you need to disable this warning for some other reason you can do so: Edit Steamworks Settings > Installation > General Installation > Advanced Options > Enable using ISteam::GetLaunchCommandLine() [...].

Reporting Crashes

Alright, we’re now able to detect crashes! But how do we make it easy for players to report them to me?

Until now, players had to navigate to %APPDATA%\Anthropic Studios\Way of Rhea\logs and then message me the most recent log, which is not exactly a user friendly experience.

We need to provide some more automation.

I don’t really want to set up my own server for this, but I also don’t really want to deal with complicated 3rd party logging frameworks + I don’t really trust them to respect my players’ privacy. I was pretty stumped, until @merlyndmg, a Monster Train dev, mentioned that they have their feedback form automatically message feedback to a Slack channel. That’s a super clever solution IMO, so I copied it.

You don’t need your own server, and while Slack is a third party, they certainly are not equipped or interested in scraping their chat logs for crash reports.

I use Discord more often than I use Slack, so I set up a Discord webhook to forward the crash log to me if the user gives consent. I’m sure you could do the same with good old fashioned email if you wanted to.

Here’s my implementation using reqwest:

// Truncate the log if its above Discord's file size limit. If we end up hitting
// this limit often we can be more clever and delete from the middle so as to
// preserve info logged at startup, or we could split it into multiple files,
// but I'm going to keep things simple for now.
const MAX_FILE_BYTES: usize = 8000000;
let data = if data.len() > MAX_FILE_BYTES {
    data.split_at(data.len() - MAX_FILE_BYTES).1.to_vec()
} else {
    data
};

// Send the message to Discord! I'm not an HTTP expert, this multipart stuff is
// tricky but thankfully reqwest handles it for us.
let part = reqwest::multipart::Part::bytes(data)
    .file_name(path.to_string_lossy().into_owned());
let form = reqwest::multipart::Form::new()
    .text("content", message) // Message must be <= 2000 bytes!
    .part("file", part);
let result = reqwest::Client::new()
    .post(&webhook_url)
    .multipart(form)
    .send();

For more info on programmatically sending Discord messages, see:

The UI (Implementation)

I’m working on a game–we could build a fancy stylized UI to fit the game’s theming, but I didn’t. Ideally the crash handler’s code should be as independent from the game as possible–we don’t want the same issue that caused the game to crash to cause the crash handler to crash. We’ve all seen crash handlers that crash, that’s a bad look.

I opted to write the UI directly with the win32 API. win32’s GUI API is a mess, and the resulting UI is ugly, but I figure it’s probably the most robust way to get a UI on screen. I don’t have time to write up a full win32 UI tutorial, but I can roughly outline my approach here and give you some tips.

Microsoft wants you to create resource files to define your UIs which are a huge pain if you don’t buy into their whole ecosystem. Instead, I used CreateWindowA and added the GUI elements programatically (see the previously linked tutorial). DialogBoxIndirectParamA may be another option for avoiding resource files, but I haven’t tried it.

Once I got things working, there were a few oddities I had to tidy up. I’ll list them here for your convenience.

All the UI code shown here is in C++ despite my engine being written in Rust, since that’s what Windows wants. The Rust winapi crate looks really nice though, I bet you could use that if you want to do this part in Rust.

Tab focus

For whatever reason, tab does not change focus between controls on normal windows. Raymond Chen has a writeup on how to make this work.

Tabbing & `ctrl+a` in text fields

By default, tabs do nothing in Windows’ editable multi-line text fields–they don’t change focus, but they also don’t cause a tab to be typed. No idea why. ctrl+a is apparently also not built in, whereas ctrl+c & ctrl+v are.

I subclassed the text edit field to make it behave more reasonably.

// Like edit, but, respects tab focus even if multiline and supports ctrl+a
// to select all.
LRESULT CALLBACK _edit_subclass_proc(
    HWND hwnd,
    UINT msg,
    WPARAM wParam,
    LPARAM lParam,
    UINT_PTR uIdSublcass,
    DWORD_PTR dwRefData
) {
    (void)dwRefData;
    (void)uIdSublcass;

    switch (msg) {
        case WM_GETDLGCODE: {
            if (wParam == VK_TAB) {
                return 0;
            } else {
                return DLGC_WANTALLKEYS;
            }
        } return 0;
        case WM_KEYDOWN: {
            if (wParam == 'A' && GetKeyState(VK_CONTROL) < 0) {
                SendMessage(hwnd, EM_SETSEL, 0, -1);
                return 0;
            }
        } break;
    }
    return DefSubclassProc(hwnd, msg, wParam, lParam);
}

CHECK(SetWindowSubclass(some_text_field_hwnd, _edit_subclass_proc, 0, NULL));

Fonts

Similarly, for some reason, the default font used is super ugly and not what you see in actual Windows UIs. You can enable the one you’re used to seeing _set_font as defined below after creating all your controls. (CHECK is a macro in my codebase that asserts the value to be nonzero.)

BOOL CALLBACK _set_font_helper(HWND hwnd, LPARAM lParam) {
    (void)lParam;
    HFONT font = (HFONT)GetStockObject(DEFAULT_GUI_FONT);
    CHECK(font);
    SendMessage(hwnd, WM_SETFONT, (WPARAM)font, TRUE);
    return TRUE;
}

void _set_font(HWND hwnd) {
    _set_font_helper(hwnd, NULL);
    EnumChildWindows(hwnd, (WNDENUMPROC)_set_font_helper, NULL);
}

Fullscreen exclusive

I’m using this same UI in game to report non-crash related issues. For whatever reason, if the feedback flow is triggered manually on my main development machine while the game is still running, the feedback window is created but not drawn if the game window is fullscreen exclusive.

To work around this, when feedback is triggered manually while the game is running, I hide the game window:

ShowWindow(parent, SW_HIDE);

And then when the feedback dialog has exited, I show it again:

ShowWindow(parent, SW_SHOW);

Also don’t forget to use EnableWindow to disable and re-enable the game while the dialog is open, or it will continue to receive some events which you may not want if you’re setting up the dialog to be modal! This is an easy way to get a crash, because you probably don’t expect your main window procedure to be called while execution is off somewhere else in the codebase.

The UI (design)

From a design perspective, the UI needs to do a few things: notify the user of the error, allow the user to add any additional details, and offer to restart (when appropriate.)

Notify the user of the error

There are a few things going on here.

First, if the feedback sequence was triggered due to an error, I tell the player plainly that an error occurred. Cutesy error messages annoy me: it feels like the people who write them think that they can make up for the fact that their software doesn’t work by spelling words funny. I’m all for spelling words funny, but if your software isn’t working, I’m already mildly annoyed–puns aren’t going to make it better.

Next, the UI offers to send the error message to me, personally, on Discord. As a user, when I see a crash reporter I just assume it’s going to /dev/null. And let’s be real, 99% of the time, it probably is. But Way of Rhea’s crash logs aren’t. I want to read them. I’m pointing out that they go directly to me in the hopes that people don’t do what I do, that they don’t click Don't Send out of spite. Don’t be like me, send me your crash reports. :)

This also gives people a way to contact me if they want to. If possible, I’d much rather hear about an error in my Discord where I can offer immediate support to the person who experienced it than find out later in a Steam review! If my games get so popular that this isn’t sustainable, I’ll adjust my strategy. That would be a great problem to have.

Next, I provide a View Report button. You’ll actually have another chance to view the report before it’s sent, but if I’m asking players’ to consent to sending me their data, I need them to trust me. Personally, when I see a very clear View Report button on a crash handler, I’m 200% more likely to click Send.

Lastly, there is a Don't Send option. This should go without saying in 2021, but apparently it doesn’t–the user has the right to not send me their data. There are no dark patterns here, I’m not gonna jiggle the window around and make it hard for you to close it, or pop open a huge frowny face when you don’t do what I want. If you click Don't Send, the report is not sent.

Allow the user to add additional details

If the user clicks send, I give them a chance to offer additional info, if they want to. It’s all optional, I’m not trying to make people jump through hoops when my software has already inconvenienced them. Focus starts out on Send, they can just press enter and skip all this. They also have one last chance to view the report.

Once they click Send here, a message box appears either announcing that it was successfully sent, or that it failed to send (e.g. if there is no internet connection.) If it fails, the things they typed are not erased. They might have taken time typing that up. Maybe they want to try again, or copy it into Discord themselves. Maybe not. But they at least have the option. Nothing feels worse than spending 15 minutes writing a detailed bug report, and then the bug reporter crashes and losing it.

…okay, plenty of things feel worse, but it’s still not a great feeling.

Offer to restart the program

Lastly, if this dialog was shown because the program crashed, I offer to restart it.

Conclusion

Starting your main program as a subprocess of the crash reporter is a fairly straightforward way to get a crash reporter working. Sending yourself the data via Slack or Discord is super convenient. We could definitely take things further:

We could try to detect infinite loops, maybe by checking whether or not the main program responds to window events in a timely manner, and kill the subprocess if stuck in an infinite loop
We could set up a ring buffer that we write screenshots to every few seconds, and forward its contents alongside the crash logs
And more…

But for now, this works. And it’s a world better than what I used to have. I’m excited for my next demo–I suppose the best case scenario nobody has reason to use it, but I will feel a lot better knowing that if I don’t get any reports, it probably means that nobody hit any crashes! And if anyone does, I’ll likely learn more about what info should be included in the report.

Table of Contents