Hyrien tricks for model fitting

In meeting with Ollivier recently, we discussed the proper approach for fitting a dynamical system model to time series data, and then selecting different models. These notes are mostly for me, but this seemed like a good place to store them. My approach had traditionally been: fit a model $m(t_i,\boldsymbol{\theta})$ to data points $\latex d(t_i)$. That means, find the optimal parameter vector $\boldsymbol{\theta}$ , by maximizing the likelihood ( $\mathcal{L}$ or equivalently the log-likelihood) using gradient descent. But, here I always had a question which matters later, “what should I choose to be the likelihood?”. In the past, I chose a normal distribution with some constant variance, i.e. $\mathcal{L} = \prod_i\mathcal{N}(m(t_i,\boldsymbol{\theta})-d(t_i),\sigma)$ .

But, his recommendation was to first do my procedure, then do a couple checks:

Examine the residuals from the best fit model (denoted mle for maximum likelihood estimator) $\epsilon(t_i) = m(t_i,\boldsymbol{\theta_{mle}})-d(t_i)$ by plotting them against time. Make sure the residuals are evenly distributed above and below zero, that is check if $\sum_i\epsilon(t_i)\approx 0$ .
It can be clear already if there is some time dependence to the variance, but it can help if not to plot $|\epsilon(t_i)|$ . This may quickly demonstrate that the variance is time dependent. If it is, I need to go back and refit with $\sigma(t)$ .
Once good, the normality assumption can be checked with a Q-Q plot. Need to learn about this… If the data are not normal, the normal likelihood may still be ok, just have to interpret the parameter covariances differently… again need to learn more about this.
So with some “good” fits, then it is fair to use AIC or something else to compare models, but if normality is broken, apparently TIC (Takashi?) but I can’t find any sources on this either.