The authors state that the usual approach to cope with few training samples is data augmentation. They extend a method of modelling the data from \cite{10.1016/j.media.2017.02.003} and use it to train a neural network. The figure below shows the overview:
https://i.imgur.com/joLNyfc.png
At the core of deformation model they determine a set of $m$ landmarks $s_i$ which they will deform and then perform an affine transformation to warp the image to align to these points. The points are moved in a constrained way. They state the constraint is a "multi-level B-spline scattered data approximation".
Here is the poster: https://i.imgur.com/enQQqxC.png