Compiler ยท Feb 7, 2020

Union types in BuckleScript

In our our 7.1.0 release we introduced the new [@unboxed] feature for better zero-cost interop with GADTs, Variants and single field records. Let's find out how this will help us expressing Union types with seamless interop!
Hongbo Zhang
Compiler Team

Introduction to Union Types

Union types describe a value that can be one of several types. In JS, it is common to use the vertical bar (|) to separate each type, so number | string | boolean is the type of a value that can be a number, a string, or a boolean.

Following the last post since the introduction of unboxed attributes in 7.1.0, we can create such types as follows:

ML
type t = | Any : 'a -> t [@@unboxed] let a (v : a) = Any v let b (v : b) = Any v let c (v : c) = Any v
RE
[@unboxed] type t = | Any('a): t; let a = (v: a) => Any(v); let b = (v: b) => Any(v); let c = (v: c) => Any(v);

Note: due to the unboxed attribute, Any a shares the same runtime representation as a; however, we need to make sure that user can only construct values of type a, b , or c into type t. By making use of the module system, we can achieve this:

ML
module A_b_c : sig type t val a : a -> t val b : b -> t val c : c -> t end= struct type t = | Any : 'a -> t [@@unboxed] let a (v : a) = Any v let b (v : b) = Any v let c (v : c) = Any v end
RE
module A_b_c: { type t; let a: a => t; let b: b => t; let c: c => t; } = { [@unboxed] type t = | Any('a): t; let a = (v: a) => Any(v); let b = (v: b) => Any(v); let c = (v: c) => Any(v); };

What happens when we need to know specifically whether we have a value of type a? This is a case by case issue; it depends on whether there are some intersections in the runtime encoding of a, b or c. For some primitive types, it is easy enough to use Js.typeof to tell the difference between, e.g, number and string.

Like type guards in typescript, we have to trust the user knowledge to differentiate between union types. However, such user level knowledge is isolated in a single module so that we can reason about its correctness locally.

Let's have a simple example, number_or_string first:

ML
module Number_or_string : sig type t type case = | Number of float | String of string val number : float -> t val string : string -> t val classify : t -> case end = struct type t = | Any : 'a -> t [@@unboxed] type case = | Number of float | String of string let number (v : float) = Any v let string (v : string) = Any v let classify (Any v : t) : case = if Js.typeof v = "number" then Number (Obj.magic v : float) else String (Obj.magic v : string) end
RE
module Number_or_string: { type t; type case = | Number(float) | String(string); let number: float => t; let string: string => t; let classify: t => case; } = { [@unboxed] type t = | Any('a): t; type case = | Number(float) | String(string); let number = (v: float) => Any(v); let string = (v: string) => Any(v); let classify = (Any(v): t): case => if (Js.typeof(v) == "number") { Number(Obj.magic(v): float); } else { String(Obj.magic(v): string); }; };

Note that here we use Obj.magic to do an unsafe type cast which relies on Js.typeof. In practice, people may use instanceof; the following is an imaginary example:

ML
module A_or_b : sig type t val a : a -> t val b : b -> t type case = | A of a | B of b val classify : t -> case end = struct type t = | Any : 'a -> t [@@unboxed] type case = | A of a | B of b let a (v : a) = Any v let b = (v : b) = Any v let classify ( Any v : t) = if [%raw{|function (a) { return a instanceof globalThis.A}|}] v then A (Obj.magic v : a) else B (Obj.magic b) end
RE
module A_or_b: { type t; let a: a => t; let b: b => t; type case = | A(a) | B(b); let classify: t => case; } = { [@unboxed] type t = | Any('a): t; type case = | A(a) | B(b); let a = (v: a) => Any(v); let b = (v: b) => Any(v); let classify = (Any (v): t) => if ([%raw {|function (a) { return a instanceof globalThis.A}|}](v)) { A(Obj.magic(v): a); } else { B(Obj.magic(b)); }; };

Here we suppose a is of JS class type A, and we use instanceof to test it. Note we use some unsafe code locally, but as long as such code is carefully reviewed, it has a safe boundary at the module level.

To conclude: thanks to unboxed attributes and the module language, we introduce a systematic way to convert values from union types (untagged union types) to algebraic data types (tagged union types). This sort of conversion relies on user level knowledge and has to be reviewed carefully. For some cases where classify is not needed, it can be done in a completely type safe way.

Want to read more?
Back to Overview