Optimal mass transports between two probability distributions originated in a problem by Gaspard Monge in 1781. A major development in the theory came in the 1940's, thanks to Nobel Laureate, Leonid Kantorovich. However, it was only In the past 25 years that the subject reached its full potential with a huge number of applications to PDEs, Geometric inequalities, mathematical finance, among many other areas. I will briefly recall the main results of this hot area of research. I will then describe how several other analytical and statistical procedures that correlate two probability distributions share many of the useful properties of optimal mass transportation. I will introduce the class of "linear transfers" between probability measures, which contains all cost minimizing mass transports, but also balayage and martingale transports, the Schrödinger bridge associated to a reversible Markov process, and the weak mass transports of Talagrand, Marton, Gozlan and others. The class also includes various stochastic aspects of mass transports, including Brownian Skorokhod embeddings to which Monge-Kantorovich theory does not apply. I also introduce the cone of "convex transfers", which include any p-power of a linear transfer, but also the logarithmic entropy, the Donsker-Varadhan information and certain free energy functionals. I will mostly exhibit examples that point to the pervasiveness of the concept in the important work on correlating probability distributions. I will point to connections to the Kolmogoroff-Arnold-Moser (KAM) theory, to its weak version as studied by Mather and Fathi, as well as to its stochastic counterpart.