Its an interesting idea. There are strong mathematical underpinnings to FM synthesis (see Bessel Functions) so it would seem possible to transform the results of traditional Fourier-style spectral analysis into the parameters to drive FM synthesis. However it is a complex problem. I suggest some basic Internet searching on the terms “FM” and “Bessel”. Here are two example, somewhat random links:
http://www.johndcook.com/blog/2016/02/17/analyzing-an-fm-signal/
https://ccrma.stanford.edu/software/snd/snd/fm.html
One of the challenges is that FM synthesis by its nature is an example of a "parameterized" synthesis technique. What I mean by "parameterized" is that a small number of parameters has great influence over the detail of the resulting sound. With the simplest 2 sinusoidal oscillator FM configuration two parameters, the carrior-to-modulator frequency ratio, and the amplitude of the modulation signal (which determines the “modulation index”), define the resulting spectra. So two parameters, and how they evolve over time, define a very rich space of harmonic possibilities. This is actually at the heart of why FM synthesis is so powerful and compelling (an part of John Chowning’s original motivation to explore it).
Spectral analysis, on the other hand, deals with the details of a sound's spectral evolution. So it is more fine grain, providing control of the spectral details down to individual frequency components.
So the mapping challenge is how to transform the numerous spectral components into the minimal FM parameters.
Another challenge is that there is no one configuration of FM synthesis. Carriers and modulators can be connected in a wide range of configurations, each of which is capable of producing very different spectra. Its kind of like regional dialects of a language that each have their own common vocabularies. The original DX7 offered 32 possible configurations of 6 “operators” (each basically a sinusoid oscillator with multi-segment pitch and amplitude envelopes). So one recommendation is to select one FM “patch” and work within its potential spectral space.
This is a problem that has fascinated me for over three decades. My belief is that the most general solution is best pursued with some type of machine-learning approach. “Train” the neural network (or equivalent) to perform the mapping of time-varying spectral components to FM synthesis parameters. I think this is the most promising avenue if the goal is to take the recording of a sound and from that re-synthesize it using FM synthesis.
But its unclear that approach provides a path to what I expect is the true goal, being able to analyze a sound and then create variations, transformations, and other new sounds from that analysis using FM synthesis. The resulting mappings produced by the machine-learning would be somewhat opaque and may well defy easy additional direct manipulation.
So perhaps a “just try and see what happens” experimental approach is more fruitful if the goal is creating new, interesting sounds from existing source materials. Something like somehow combining the time-varying spectral amplitude information and using that to control a simple FM-pair’s time-varying modulation index. Or do the same thing but use multiple FM-pairs each of which is controlled by specific sets of the original spectra; for example three FM pairs control by low-frequency, mid-frequency, high-frequency spectral components respectively. None of this is likely to give you a true re-synthesis of the original sound but that is OK. It should produce some new sounds, and likely sounds you would not have encountered otherwise. I think Kyma would be a great playground for such exploration!