Tensorflow Best Practises: Named Inputs and Outputs

Quit depending on positional indices and input value ordering. Start relying on named inputs and outputs. Avoiding data wiring errors

Theodoros Ntakouris
Towards Data Science

--

Named inputs and outputs are essentially dictionaries with string keys and tensor values.

Benefits

  1. Defence Against Feature Reordering
  2. Self — Sufficient Model Serving Signatures and Metadata
  3. Renaming and Absent Feature Protection

Most machine learning pipelines read data from a structured source ( database, CSV files/ Pandas Dataframes , TF Records), perform feature selection, cleaning, (and possibly) preprocessing, passing a raw multidimensional array (tensor) to a model along with another tensor representing the correct prediction for each input sample.

Reorder or rename input features in production?Useless results or the client — side breaks in production

Absent Features? Missing Data? Bad output value interpretation? Mixing up integer indices by mistake? Useless Results or the client — side breaks in production

Want to know what feature columns were used for training in order to provide the same ones for inference?You can’t — Misinterpretation Errors

Want to know what value output values represent?You can’t — Misinterpretation Errors

Don’t drop column names on the model input layers.

The tf.data.Dataset already allows you to do that by default, by treating the input as a dictionary.

Over the years the above problems have got easier to deal with. Here’s a small overview of available solutions, with the Tensorflow 2.x ecosystem.

  • TFRecords and tf.Example is hands down the best data format to use with any scale deep learning projects. Every feature is named by default.
  • Tensorflow Transform uses named inputs and produces named outputs, encouraging you to do the same for your model.
  • Keras supports dictionaries of layers as inputs *and* outputs
Adding multiple-sized features is trivial: just add another parameter for window_size or pass feature shapes along with the feature names.
  • TensorSpec and Serving Signature definitions support named IOs by default.

By using this serving_raw signature definition, you can call a Tensorflow Serving Endpoint directly by a JSON payload, without serialising to tf.Example .

Check out the metadata signature on TF Serving, with a sample bitcoin prediction mode I am currently working on:

Lastly, if you are using TFX or got a protocol buffer schema for the inputs, you should use that to send over data for inference, as it is much more efficient and the errors appear in the client — side sooner, instead of the server — side. Even on this case, keep using named inputs and outputs for your model.

Thanks for reading all the way to the end!

Want to also learn how to structure your next machine learning project properly?

--

--