29 January 2023

Port it to the web with Web assembly and Emscripten

by Baduit

Article::Article

In my previous article I showed you how I rewrote my parser for my esoteric programming language PainPerdu. Now that all the cleanup on this project has been done, it is time to finally port PainPerdu to the web!

What is web assembly?

Web assembly, often abbreviated Wasm is binary instruction format to complete JavaScript on the web. It is designed to be loaded and to run fast, to be safe and in a sandbox environment on the web page, but not only.
Concretely instead of compiling your C or Rust code to native code, you can instead target Wasm binary code and run it on any web browser.

Why use web assembly in a project?

As I just said, Wasm is perfect if you need performance. Even if JavaScript is fast, Wasm can be even faster! If the comparison of the 2 interest you, I suggest to read this article about it.
Port existing application. This is exactly my use case, I already have a program working well in C++, I don’t want to rewrite everything in JavaScript, but with Wasm I can just compile my code to Wasm.
You are not forced to used JavaScript anymore (or TypeScript), now you can have more variety. Also, if your team already have a great expertise on one language, it will probably be more efficient to stick with this language instead of forcing everyone to learn JavaScript. This is one of the reasons behind Blazor where you can use C# everywhere.

Goal

As I stated in my first article about PainPerdu, the goal is to not have a backend anymore to run the online interpreter.
In the original implementation, the front send the code to the backend, the backend run it and return the result to the front.

Ideally with Emscripten I can remove the backend and create a function callable from JavaScript to be straight up replacement for the http request to the backend I was doing.

Emscripten

Description

Emscripten is the most popular C/C++ toolchain to Wasm. Because it is based on LLVM, you can technically use it for a lot of other language like rust or D. It also convert some POSIX into the corresponding web APIs and it also convert OpenGL calls to WebGL!

Installation

You can find the installation guide here. I won’t go in the details because it is straightforward:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# Get the emsdk repo
git clone https://github.com/emscripten-core/emsdk.git

# Enter that directory
cd emsdk
# Fetch the latest version of the emsdk (not needed the first time you clone)
git pull

# Download and install the latest SDK tools.
./emsdk install latest

# Make the "latest" SDK "active" for the current user. (writes .emscripten file)
./emsdk activate latest

# Activate PATH and other environment variables in the current terminal
source ./emsdk_env.sh

Basic usage

Emscripten has emcc and em++ which are straigh up replacement for gcc and g++.

If you want to use Emscripten with CMake you can the Cmake toolchain provided:

1
2
# $EMSCRIPTEN_ROOT is the directory where Emscripten has been cloned
cmake -DCMAKE_TOOLCHAIN_FILE="$EMSCRIPTEN_ROOT/emsdk/upstream/emscripten/cmake/Modules/Platform/Emscripten.cmake" .

Note that Emscripten will create 2 files:

a .wasm containing your compiled C++ code
a .js file with all the boilerplate code to be able to use use and include the .wasm file without any issue

Integration with Vcpkg

If you remember my article about Vcpkg, you know that Vcpkg also use CMake variable DCMAKE_TOOLCHAIN_FILE. We have a problem here because both Emscripten and Vcpkg use this variable.
But if you also remember the commands from my CMake cheat sheet you also remember that there is an other CMake option we can use to use a toolchain with Vcpkg: DVCPKG_CHAINLOAD_TOOLCHAIN_FILE.

1
cmake -DCMAKE_TOOLCHAIN_FILE=vcpkg_toolchain -DVCPKG_CHAINLOAD_TOOLCHAIN_FILE=emscripten_toolchain .

Adapt my project build

Now that we know how to use Emscripten with CMake and Vcpkg, we need to do some little adjustments to the way we build the projects.

My first step was to create a new target in the bin/EmPainPerdu directory with a cpp file named EmPainPerdu.cpp, empty for now and a CMakeLists.txt containing this:

1
2
3
4
5
6
7
8
set(FORCE_EXCEPTION_FLAG "-fwasm-exceptions")

add_executable(EmPainPerdu EmPainPerdu.cpp)
target_compile_options(EmPainPerdu PRIVATE -Wextra -Wall -Wsign-conversion -Wfloat-equal -pedantic -Wredundant-decls -Wshadow -Wpointer-arith -O3 ${FORCE_EXCEPTION_FLAG})
target_link_options(EmPainPerdu PRIVATE ${FORCE_EXCEPTION_FLAG})
target_include_directories(EmPainPerdu PUBLIC ${CMAKE_CURRENT_LIST_DIR})
target_link_libraries(EmPainPerdu PRIVATE PainPerdu)
set_target_properties(EmPainPerdu PROPERTIES RUNTIME_OUTPUT_DIRECTORY "${CMAKE_CURRENT_LIST_DIR}/../../website/generated")

This is a classic definition of a target except for 2 things:

In the compilation and link flags there is now -fwasm-exceptions because the code use exceptions and exceptions in Wasm are still experimental.
The last line to have my Wasm code and the corresponding JavaScript wrapper directly in the front directory so I don’t have to copied them manually.

Then I added an option in my CMakeLists.txt at the root of the project to know if we are using Emscripten or not:

1
option(EMSCRIPTEN "EMSCRIPTEN" OFF)

And I used this variable to choose which target I want to build:

1
2
3
4
5
if (EMSCRIPTEN)
    add_subdirectory(bin/EmPainPerdu)
else()
    add_subdirectory(bin/interpreter)                    
endif()

The last thing I did was to also add the -fwasm-exceptions flag when needed in my unit tests:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# This little addition
if (EMSCRIPTEN)
	set(FORCE_EXCEPTION_FLAG "-fwasm-exceptions")
else()
	set(FORCE_EXCEPTION_FLAG "")
endif()

function(addTest test_name test_files)
	set(test_target ${test_name}_test)

	add_executable(${test_target} ${test_files})
	# Update the compilation flags
	target_compile_options(${test_target}
						PRIVATE
						$<$<CXX_COMPILER_ID:MSVC>:/W3 /permissive- /TP>
						$<$<OR:$<CXX_COMPILER_ID:GNU>,$<CXX_COMPILER_ID:Clang>>:-Wextra -Wall -Wsign-conversion -Wfloat-equal -pedantic -Wredundant-decls -g ${FORCE_EXCEPTION_FLAG}>)
	target_link_libraries(${test_target} PRIVATE PainPerdu doctest::doctest)
	# And the link flags              
	target_link_options(${test_target} PRIVATE ${FORCE_EXCEPTION_FLAG})
	target_include_directories(${test_target} PRIVATE ${CMAKE_CURRENT_LIST_DIR})
	add_test(${test_name} ${test_target})
endfunction()

You can now build the Emscripten target like this:

1
cmake -DCMAKE_TOOLCHAIN_FILE=vcpkg_toolchain_path -DVCPKG_CHAINLOAD_TOOLCHAIN_FILE=emscripten_toolchain_path -DEMSCRIPTEN=ON .

Run the unit test with Node.js

Do you remember when I said that Wasm was not only designed to work on a web page? Here’s an example of that, it can also run with a Node.js backend! And we will use it to run our unit tests!

We just need to run this commands to start the 2 units test suites:

1
2
node --experimental-wasm-eh  ./test/parser/parser_test_test.js
node --experimental-wasm-eh  ./test/functional/functional_test_test.js

It is that simple!

Change my github workflows to also run the unit tests with Wasm

The unit tests were already automatically build for Windows and Ubuntu in the Github workflows. Now that I have a new target: Emscripten, let’s add it too.

In the file .github\workflows\unit_tests_cpp.yml let’s add this job with the bare minimum:

1
2
3
4
5
6
7
  build_emscripten:
    name: Build Emscripten
    runs-on: ubuntu-latest

    steps:
      - name: Checkout repository
        uses: actions/checkout@v2

Now let’s install emscripten:

1
2
3
4
5
6
7
8
    - name: Install emsdk
      run: |
        git clone https://github.com/emscripten-core/emsdk.git
        cd emsdk
        ./emsdk install latest
        ./emsdk activate latest
        source ./emsdk_env.sh
        cd ..

Then install Vcpkg:

1
2
    - name: vcpkg init
      run: git clone https://github.com/microsoft/vcpkg && ./vcpkg/bootstrap-vcpkg.sh

Build the project:

1
2
3
4
5
6
    - name: Build project
      run: |
        source ./emsdk/emsdk_env.sh
        ls ./emsdk/upstream/emscripten/cmake/Modules/Platform/Emscripten.cmake
        cmake -DCMAKE_TOOLCHAIN_FILE=./vcpkg/scripts/buildsystems/vcpkg.cmake -DVCPKG_CHAINLOAD_TOOLCHAIN_FILE=./emsdk/upstream/emscripten/cmake/Modules/Platform/Emscripten.cmake -DEMSCRIPTEN=ON .
        cmake --build .

And finally run the tests:

1
2
3
4
    - name: Run tests
      run: |
        node --experimental-wasm-eh  ./test/parser/parser_test_test.js
        node --experimental-wasm-eh  ./test/functional/functional_test_test.js

And voilà

Create and export the bindings

Emscripten is provided with a library Embind to create easily bindings between C++ and JavaScript.
To use it, we need to add -bind to the link options in bin/EmPainPerdu/CMakeLists.txt:

1
target_link_options(EmPainPerdu PRIVATE --bind ${FORCE_EXCEPTION_FLAG})

And in the file EmPainPerdu.cpp

1
2
3
#include <emscripten/bind.h>
// Because I'm lazy and I know I won't have any name collision in this file
using namespace emscripten;

Let’s thing what do we need to expose to the JavaScript: a function that takes a string in input and return the console output and the state of the memory in output.

This mean we need a structure to contains the result:

1
2
3
4
5
struct PainPerduResult
{
    std::string console_output;
    std::vector<uint8_t> stack;
};

But because we will later export this structure and we will only able to access the member functions of the exposed structure and class, let’s add a getter for each attribute.

1
2
3
4
5
6
7
8
struct PainPerduResult
{
    const auto& get_console_output() const { return console_output; }
    const auto& get_stack() const { return stack; }

    std::string console_output;
    std::vector<uint8_t> stack;
};

Now we can create the function:

1
2
3
4
5
6
7
8
#include <PainPerdu/PainPerdu.hpp>
#include <sstream>

PainPerduResult run_pain_perdu_code(const std::string& input)
{
    PainPerduResult result;
    return result;
}

This does nothing yet, but we can take the code used in the backend, but instead of filling the outputs in a std::map, it will be stored in the PainPerduResult :

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
PainPerduResult run_pain_perdu_code(const std::string& input)
{
    PainPerduResult result;
    try
    {
        std::stringstream out;
        std::stringstream in;
        PainPerdu::Interpreter interpreter(in, out);
        interpreter.disable_input();

        interpreter.compile_and_run(input);

        result.console_output = out.str();
        result.stack = interpreter.get_stack();
    }
    catch (std::exception& e)
    {
        result.console_output = std::string("Error : ") + e.what();
	}
    return result;
}

We didn’t use EmBind yet, but it is time. The first step is to create a module:

1
EMSCRIPTEN_BINDINGS(my_module) {}

The second step is to add the function:

1
2
3
// The js function will be named run_pain_perdu_code
// And it will call the C++ function run_pain_perdu_code
function("run_pain_perdu_code", &run_pain_perdu_code);

The third step is to explain to the JavaScript what is a std::vector<uint8_t>, because by default it doesn’t:

1
register_vector<uint8_t>("vector<uint8_t>");

The fourth and final step is to register the structure:

1
2
3
class_<PainPerduResult>("PainPerduResult")
        .function("console_output", &PainPerduResult::get_console_output)
        .function("stack", &PainPerduResult::get_stack);

Here’s the final code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
#include <sstream>

#include <emscripten/bind.h>

#include <PainPerdu/PainPerdu.hpp>

using namespace emscripten;

struct PainPerduResult
{
    const auto& get_console_output() const { return console_output; }
    const auto& get_stack() const { return stack; }

    std::string console_output;
    std::vector<uint8_t> stack;
};

PainPerduResult run_pain_perdu_code(const std::string& input)
{
    PainPerduResult result;
    try
    {
        std::stringstream out;
        std::stringstream in;
        PainPerdu::Interpreter interpreter(in, out);
        interpreter.disable_input();

        interpreter.compile_and_run(input);

        result.console_output = out.str();
        result.stack = interpreter.get_stack();
    }
    catch (std::exception& e)
    {
        result.console_output = std::string("Error : ") + e.what();
	}
    return result;
}


EMSCRIPTEN_BINDINGS(my_module) {
    function("run_pain_perdu_code", &run_pain_perdu_code);

    register_vector<uint8_t>("vector<uint8_t>");
    class_<PainPerduResult>("PainPerduResult")
        .function("console_output", &PainPerduResult::get_console_output)
        .function("stack", &PainPerduResult::get_stack);
}

Integrate it in the front-end

We created the bindings, that’s cool, let’s use them!
For that we need to actually include the generated code, to do that we can add this line to the index.html:

1
<script src="generated/EmPainPerdu.js" type="text/javascript"></script>

Now in main.js we can totally remove the http request made with the fetch function and instead just call our exported function:

1
var answer = Module.run_pain_perdu_code(document.getElementById("yololInput").value);

That’s all folks. The integration was that easy.

Article::~Article

The backend of PainPerdu becomes officially useless! The online interpreter fully run on the browser thanks to Web Assembly and Emscripten, you can see the online editor here. And we have our unit tests running with node too without any effort.

Sources

tags: cpp - vcpkg - WebAssembly - emscripten