Functional and Type-Safe Heterogeneous Data Handling in Rust
Safely store and handle incoming heterogeneous data in a functional, type-safe, rusty way.
You’ve got to make a web API that accepts JSON objects (or anything that serde supports, actually). You need some code — first way of declaring the structure of those incoming entities, so that they can be processed accordingly and deserialized into easy-to-use structs and traits. Consider the following example:
- An endpoint receives an ID and an Value that needs to be stored on a database, and retrieved afterwards
- For start, we only need to handle a few cases: strings, integers, floats and booleans
While this seems pretty straightforward, let’s investigate what happens when we need to incrementally add new types in our app, for example:
- We need to handle captioned images
- We need to handle specific JSON structures differently as payload
It is fair to assume that the backing database can only efficiently store small values, and you need to redirect larger artifacts (such as media), in a different blob storage service. We’re going to also assume the usage of a Document Database, but you can really use any storage system with proper modifications (ex. flattening your schemas).
There are a lot of ways to face this problem, with different advantages and disadvantages, either on the code-side or the database/infrastructure-side. We’re going to take a completely type safe and functional approach:
Everything will be typed, and when we need to add a new handled type to our application, every case will need to be reasoned about with code, else the program will not even compile. This way, no mistakes are going to be made, and we are always going to cover all edge cases of data processing and storing.
Let’s start by defining the schema that our service will consume. We are looking for 2 kinds of HTTP requests: to save entities and to retrieve entities, by id.
First, for a simple primitive-valued payload:
Then, for other, heterogeneous payloads, such as a captioned image:
You can easily map those out to a few simple structs . But, we need a functional way of handling this heterogeneity. Serde’s (untagged) json enum parsing comes to the rescue.
We can use the following declaration in order to box the
value field of the JSON objects as an enum, containing our different payload types:
I called this entity
EventEntity , but you can call this anyhow. This arrangement of structs and enums is able to parse both required inputs automatically, and you can still use
Option<T> if something is optional.
Those contract objects define our endpoint’s entity parsing signature. It’s a good practise to map those to different data model entities that you are going to actually persist in the database (and in many cases, it is also advised to also encorporate a domain collection of entities, mapping included) — see for example: Clean Architecture or other popular methodologies.
We’re going to use the functional aspect of the language and the structures we just created, in order to map our
EventEntity , to some
DataResource , that we are going to ultimately save to disk. This may seem nonesense at first, but as you proceed to add more heterogeneous data that can be uniformly treated by your system (i.e. with great power comes great responsibility: don’t overdo it), it will make sense and you are also going to appreciate the security benefits. Nobody will forget to implement an edge case because they can’t; the program won’t compile.
This might seem terribly similar at first glance: and it is, except when you need to treat differently anything that’s different than a primitive value (ex. save an image to a blob storage).
Before stitching everything together on our controller, we’re going to define a very simple database trait (interface — for the OOP friends out there) that supports 2 operations:
There’s a very simple implementation of this under gh/ntakouris/hopplex, but all this
Mutex/Cell/Box are out of scope for this article, so you can just think of this as a simple hash map.
Using the popular rocket.rs framework, we’re going to scaffold a new project and add 2 endpoints:
/publish_event . We’re also going to initialize an instance of our trait interface database to use in our endpoint functions:
And finally, the missing parts are very, very trivial to fill in (you can’t do much but handle all the cases anyway!).
First, for the insertion part, we’re simply going to map any field that does not need special care. For this example, the
CaptionedImage variant needs extra care, and that’s really easy to do:
Retrieving the data follows the same pattern with inverse logic: retrieve the image from the blob storage instead of saving it. The caption, or other small structured json objects are saved in the document itself:
Add your own error and/or response types and you’ve got a very elegant project to work with. Test cases are very easy to reason about as well: one for each action (endpoint), for each declared (supported) entity type.
In the event where you also need to process a stream of this entities in the same, structured manner, this sort of handling heterogeneous data is very useful as well.
The sample code is open source, ready to be downloaded and
cargo run d, at github.com/ntakouris/hopplex. Please, download it and mess around!