PARK: Distributed Global Simultaneous Fit Optimizer

Quick links: Reference Manual

PARK provides a distributed global simultaneous fit optimizer for determining the best model parameters for the measured data.

op-ti-mize [v.] make the best or most effective use of

The interface to the optimizer is a set of parameters that can be adjusted and a function to determine the goodness of a particular choice of parameters. PARK has a variety of optimizers that it can use, from robust derivative free optimizers suitable for problems without an easy analytic description, to efficient Newton style optimizers which use information from the first derivative to rapidly converge on the best solution.

fit [n.] be of the right shape or size

Comparison of a measured value to the expected value on a measurement device for a given physical model yields a goodness of fit. This value is affected by the choice of parameter values in the physical model, and by the uncertain parameters for the measurement process. Using a Bayesian approach to the fitting problem, we can incorporate prior information into the fitting process. We can for example limit the individual model parameters to a range of values based on knowledge of how the system was constructed. The goodness measure that we are maximizing is the log probability of observing the measured data within its measurement uncertainty given the probability that the input parameters can take on the particular value. In practice this leads to the traditional sum-of-squares goodness of fit measurement, but individual measurements can use alternative prior probability distributions for the measurements if the measurement uncertainty is significantly different from the normal distribution.

si-mul-ta-ne-ous [adj.] occurring, operating or done at the same time

Simultaneous fitting adjusts parameters in the underlying model so that the expected values of the measurements of different datasets bests matches the actual values measured. The parameters for the individual measurement estimators (theory functions), will be systematically related, with the user able to specify relationships and limits from other sources of information. Using measurements from different techniques is sound if there is no systemaic bias and if the uncertainty in the measurement values is correctly estimated. In practice this is not always the case, and the user has considerable flexibility in choosing the set of measurements to use and the weighting of the individual data points when performing simultaneous refinement.

glo-bal [adj.] embracing the whole of something

Optimizers either operate locally, choosing better and better values of the parameters until small variations in the parameter values do not improve the goodness of fit, or they operate globally, which means sometimes choosing values which may be worse. There are a variety of global algorithms available (multi-start local optimizers, genetic algorithms, differential evolution, parallel tempering, branch and bound), and PARK will have several to choose from. For now only the basic multi-start local optimizer is available.

con-strain [v.] restrict the extent of

For many problems, the range allowed for the individual parameters is not unlimited. The simplist form of constraint, the bounds constraint, restricts the value to occurring within a simple range. Bounds constraints are supported in PARK from the definition of the model parameters, each of which can have a minimum and maximum value. Parameters can also have hard limits beyond which the model is not valid. These, too, are defined by the model so that user cannot inadvertently choose an invalid range.

In addition to simple bounds, constraints on the parameters may be more complicated. For example, the sum of the thicknesses in a layered sample should match the total thickness of the sample. Such a constraint can be supplied by forcing the thickness of one layer to be the total thickness minus the thicknesses of the other layers, but that still leaves a problem: thicknesses cannot be negative. More complex constraints are possible, leading to a non-linear boundary for the feasible space of the parameters. PARK does not yet support such constraints, except to have the model return infinity if it enters an infeasible region.

dis-tri-bute [v.] give shares of (something); deal out

Fitting is an example of an embarassingly parallel problem. Different agents can search independently for the best goodness of fit, and it is trivial to combine the results and report the best goodness of fit for all of them. This allows us to distribute the problem across multiple computers without doing much more work than one computer trying to find the minimum on its own. PARK provides the communication and control infrastructure for performing distributed fits.

un-cer-tain [adj.] not to be relied on; not known or definite

An important aspect of global optimization is reporting uncertainty for the given fit parameters. The usual approach in local minimization and the one available in the current version of PARK is to report the covariance matrix, which corresponds roughly to describing a quadratic approximation to the shape of the local minimum. This description is further impoverished by choosing only the diagonal elements of the covariance matrix, hiding correlations in the uncertainties of the various parameters. For local minima with long, meandering valleys, and for global optimizations problems containing many well defined local minima a more robust description of the uncertainty is required, which can give the user a sense of the reliability of the reported solution.

A number of approaches to the problem are available. The strict Bayesian approach is to give the joint probability distribution over all space as the solution. This is not very satisfying to the modeller and is hard to report in the literature, particularly since the reported solution may contain artifacts of the particular measurement. We will be using a variety of techniques for computing and reporting on the minimal value, including multiple fits to resynthesized data so that measurement artifacts are reduced, and factor analysis to show which parameters are covarying. We would like to be able to show users exemplars from widely separated local minima so that they correctly describe the confidence they have in the reported fit.

lay-er [n.] a hen that lays eggs

The layered architecture of PARK can be accessed at different levels, from the scientist user who can define a specialized model for a single system and automatically have a graphical user interface to interact with the parameters and the data, to the application developer who can use the fitting service backend for distributed computing support while providing their own user interface.

The current version of PARK adjust the parameters for a particular model. In practice, though, even the choice of model is uncertain, and a different model (e.g., one with more layers in a layered system, or with a different fit structure) may fit the system better. Measures such as the Aitchen information criterion (AIC) can help in that they describe the cost and benefit of adding additional parameters to the model. Over the coming years, PARK will add support for searching amongst different models. This search will be problem dependent, with a variety of heuristics to constrain the choice of models.

Current Status

The PARK fitting service exists only in a basic form described above. Over the remaining years of the DANSE project we will be improving the service, adding additional optimizers and better reporting of the results.

PARK is in the midst of revision to make it much easier to use. The current version, park 1.2, has a restructured model definition facility. We are revising the park backend to enable distributed fitting with the new models. Please be patient with us as we put together a new release expected for the end of May 2009.