I have the input/output buffer in the Kyma preferences at 10ms. Is that fixed at 10ms, or is it up to a maximum of 10ms?
The input-to-output time that you set in Kyma Preferences is constant (fixed). For example, having a fixed input-to-output time allows you to quickly learn and adapt to that fixed delay while performing live with Kyma (just as you can adapt to the propagation delay of sound through the air when you are performing with other musicians on stage).
Am right in thinking if a sound can’t compute within that 10ms it will run ‘out-of realtime’ and not play?
If the entire signal flow graph cannot be computed in a single cycle (1 / SampleRate), that is considered 'out of realtime'. Some algorithms may have 'bursts' of heavy processing followed by intervals of less intensive processing; these algorithms can benefit from the output buffer because they can start filling the buffer during the less intensive intervals so when you hit a burst of intense computation, you can fall a little bit behind realtime while still being able to output samples from the buffer at a regular, periodic sample rate.
I understand that whilst Kyma computes on a sample per sample basis rather than a vector of samples
One of the benefits of this is in the construction of long chains of processing in the signal flow editor: adding a module to the chain does not increase the delay time through that chain. You can add as many modules as you like; the entire signal flow graph is always computed on each cycle (the duration of a cycle is the inverse of the sample rate, so for example, if your sample rate is 48 kHz, a single cycle lasts 20.8 microseconds).
the FFT imparts a delay though right (the window size)
You're right, the concept of window length is built into the very definition of spectral analysis. In order to know the frequency of a signal, you have to wait until you've seen at least one full cycle of the lowest frequency of the time-domain waveform. For spectral analysis, the length of the window is inversely related to the frequency resolution of the analysis you produce. In other words, the longer the window (sometimes called a spectral frame), the lower the fundamental you can detect, the more frequency bands you can analyze, and the closer together those frequency bands can be. This principle is fundamental (no pun intended) to spectral analysis, no matter what algorithm or which software implementation you are using to perform the analysis.
Or is it because after the delay is taken into account the processing is sample by sample?
In other words, before you can use spectral features in the other Sounds, you have to wait long enough for the time signal to be "observed" and for those features to be extracted from the observed signal. Once the content is known, it can be utilized by other Sounds on a sample-by-sample basis.