All you need to know about the state of the art Transformer Neural Network Architecture, adapted to Time Series Tasks. Keras code included.

Image by romnyyepez from Pixabay

Table of Contents

  • Introduction
  • Preprocessing
  • Learnable Time Representation (Time 2 Vec)
  • Architecture
  • Bag Of Tricks (things to consider when training Transformers)


Attention Is All You Need they said. Is it a more robust convolution? Is it just a hack to squeeze more learning capacity out of fewer parameters? Is it supposed to be sparse? How did the original authors come up with this architecture?

An organised codebase enables you to implement changes faster and make less mistakes, ultimately leading to higher code and model quality. Read more to learn how to structure your ML projects with Tensorflow Extended (TFX), the easy and straightforward way.

Image by Francis Ray from Pixabay

Project Structure: Requirements

  • Enable experimentation with multiple pipelines
  • Support both alocal execution mode and a deployment execution mode. This ensures the creation of 2 separate running configurations, with the first being used for local development and end-to-end testing and the second one used for running in the cloud.
  • Reuse code across pipeline variants if it makes sense to do so
  • Provide an easy to useCLI interface for executing pipelines with different configurations and data

A correct implementation also ensures that tests are easy to incorporate in your workflow.

Project Structure: Design Decisions

  • Use Python.
  • Use Tensorflow Extended (TFX) as the pipeline framework.

In this article we will…

Quit depending on positional indices and input value ordering. Start relying on named inputs and outputs. Avoiding data wiring errors

Image by Daniel Dino-Slofer from Pixabay

Named inputs and outputs are essentially dictionaries with string keys and tensor values.


  1. Defence Against Feature Reordering
  2. Self — Sufficient Model Serving Signatures and Metadata
  3. Renaming and Absent Feature Protection

Most machine learning pipelines read data from a structured source ( database, CSV files/ Pandas Dataframes , TF Records), perform feature selection, cleaning, (and possibly) preprocessing, passing a raw multidimensional array (tensor) to a model along with another tensor representing the correct prediction for each input sample.

Reorder or rename input features in production?Useless results or the client — side breaks in production

Absent Features? Missing Data? Bad…

Namely: Hyperparameter Search, Convolution Variants, Network-in-Network, Weight Sharing, Pruning, Quantization

The following is a very, very brief and non-technical summary of a chapter in the author’s dissertation.

Picture by stevepb on Pixabay


A very deep neural network that is designed to solve hard problems might take a few seconds to run on modern computers with hardware accelerators for linear algebra operations. Similarly, smaller neural networks that do not take that much time to run, still do not meet realtime constraints. Hardware resources and execution time constraints is what drives the research community to investigate different methods of neural network compression. In the next few sections, common compression methods are presented.

Hyperparameter Search

In this specific problem, we…

Extracting labels, windowing multivariate series, multiple TF Record file shards and other useful tips for dealing with sequential data

Image by stux from Pixabay

The API is a very efficient pipeline builder. Time Series Tasks can be a bit tricky to implement properly. In this article, we are going to dive deep into common tasks:

  • Windowing Labelled Data
  • Windowing Unlabelled Data by Looking Ahead
  • Sharding TF Record Files Tips for Efficiency and No Data Loss

Let’s begin!

Windowing Labelled Data

With the dataset api this is simple to do. Assume the following configuration. input feature is a and label is b .

a, b
1, 0
2, 0
3, 1
4, 0
5, 0
6, 1

Each row can be described by a tensor shaped (2,)

A creative PoseNet application that runs on your browser and tries to predict if you’re jumping, crouching, or staying still

Screenshot from

You all know what this game is about. This is the best service-offline-sorry page in the world. People have made simple bots that time the dino’s jump to beat the game to reinforcement learning agents with CNN state encoders.

It’s a game and we’re supposed to have fun. Today, I’ll walk you through how to write some JavaScript code to play the game by jumping around in your room.

This thing is hard to play.

You can try the game here and view the full source code here.

Overcoming Tech Barriers

Setting up a small webpage with basic javascript support to get…

A quick api overview and a self-contained example of fluent-tfx

If this production e2e ML pipelines thing seems new to you, please read the TFX guide first.

On the other hand, if you’ve used TFX before, or planning to deploy a machine learning model, you’re in the right place.

Image by Michal Jarmoluk from Pixabay

But Tensorflow Extended is already fully capable to construct e2e pipelines by itself, why bother to use another API ?

  • Verbose and long code definitions. Actual preprocessing and training code can be as lengthy as an actual pipeline component definition.
  • Lack of sensible defaults. You have to manually specify inputs and outputs to everything. This allows maximum flexibility on one hand, but on the other 99% of cases, most of the IOs can be automatically wired.

Why it exists and how it’s used in Beam Pipeline Components

Image from

ML Metadata (MLMD) is a library for recording and retrieving metadata associated with ML developer and data scientist workflows.

TensorFlow Extended (TFX) is an end-to-end platform for deploying production ML pipelines

The current version of ML Metadata by the time this article is being published is v0.22 (tfx is also v0.22). The API is mature enough to allow for mainstream usage and deployment on the public cloud. Tensorflow Extended uses this extensively for component — component communication, lineage tracking, and other tasks.

We are going to run a very simple pipeline that is just going to generate statistics and the…

A practical and self-contained example using GCP Dataflow

The fully end to end example that tensorflow extended provides by running tfx template copy taxi $target-dir produces 17 files scattered in 5 directories. If you are looking for a smaller, simpler and self contained example that actually runs on the cloud and not locally, this is what you are looking for. Cloud services setup is also mentioned here.

Picture from

What’s going to be covered

We are going to generate statistics and a schema for the Chicago taxi trips csv dataset that you can find by running the tfx template copy taxi command under the data directory.

Generated artifacts such as data statistics or the schema…

Motivation, intuition and the process behind this series of articles

Hi there. I’m Theodoros, a Computer Engineering Student here in Greece and I love deep learning.

Welcome to the Understanding Machine Learning in Production. In this article we are going to go over what the main objective of this series is all about and a rough outline of what is going to be covered.

I’m creating these articles because I feel that although the tensorflow ecosystem and high level APIs like keras along with all these free (and non free) tools and services that big companies provide online, like the famous google colab, lower entry barriers to machine learning, the…

Theodoros Ntakouris

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store