Python & Maintainability: Keeping track of constants
Hard-coded string values that linger around your project impact your ability to change negatively, but there’s an easy way around that.
But first, what’s wrong with magic strings?
We’re going to demonstrate the usage of using static classes to keep track of constants, in 2 scenarios:
- Declaring routes in a FastAPI app
- Keeping track of metrics and artifacts in a machine learning pipeline
You simply need to pass data around. It’s so tempting to just instantiate a new dict and assign a few items to it; Python makes this so easy, just
The superbly optimized Python dict, is both a blessing, and a curse, for code readability and maintainability. It’s not proper to have hard coded constants flying around. A string rename in one file can lead to problems in another portion of the program. If the project contains good test code, this mistake is probably going to be caught during testing. If not, your api/data pipeline/you name it, will blow up in prod. We don’t want that.
Proposed technique: Static class definitions
We’re going to keep all the constant definitions in one place: under a python module, inside classes. Hierarchies like that, can help the reader/reviewer understand the scope of your changes, and there’s a pretty easy way to rename everything by editing a few lines of code.
Let’s consider a FastAPI app and showcase how we are going to declare routes. There are 2 ways to go around this. Pick your poison:
- A relative structure where FastAPI itself keeps track of prefixes
- And an absolute structure where the classes keep track of their hierarchy
Personally, I’d pick the second one. It’s easier to construct URIs by using those classes/lambda functions.
Let’s consider a simple ML pipeline. You train, evaluate the model and post metrics to some experiment tracking server. Almost all of those tracking solutions accept a very specific structure for metrics, which makes it so tempting to use dicts:
str -> value pairs. Those can also contain some iteration index or timestamp.
One could argue that for each experiment, you can declare some sort of data class or typed dict to keep track of those. You can have lot’s of optional parts there, and an implementation should look like this:
You see that things quickly get cluttered: optional fields with default
None values, and a lack of over-time representation — you’d have to either have a list of
MetricsInstagram or create a new dataclass that contains the history of all metrics for the given experiment. Data Science and Machine Learning projects are bound to change very fast (especially during early stages). This looks like a good match for completely dynamic dictionaries.
Adding a new experiment variant or a new optional metric in specific use cases can slow the whole project down, just because you have to maintain the metrics package! Not to mention that this needs to pull all the metrics from the tracking server, map those to the dataclass and then pick the one you need!
We can tackle those problems by using (you guessed it) static class structures and scopes! Here’s what I mean:
That’s just a proof of concept example. I expect the interested reader to refactor and rearrange parts according to his own needs.
This is a really flexible approach that supports arbitrary combinations of scopes and metric names, and that also considers some sort of maintainability. It’s my personal choice for machine learning projects, as of now.
Thanks for reading!