1
2 """
3 An assembly is a collection of fitting functions. This provides
4 the model representation that is the basis of the park fitting engine.
5
6 Models can range from very simple one dimensional theory functions
7 to complex assemblies of multidimensional datasets from different
8 experimental techniques, each with their own theory function and
9 a common underlying physical model.
10
11 Usage
12 =====
13
14 First define the models you want to work with. In the example
15 below we will use an example of a simple multilayer system measured by
16 specular reflection of xrays and neutrons. The gold depth is the only
17 fitting parameter, ranging from 10-30 A. The interface depths are
18 tied together using expressions. In this case the expression is
19 a simple copy, but any standard math functions can be used. Some
20 model developers may provide additional functions for use with the
21 expression.
22
23 Example models::
24
25 import reflectometry.model1d as refl
26 xray = refl.model('xray')
27 xray.incident('Air',rho=0)
28 xray.interface('iAu',sigma=5)
29 xray.layer('Au',rho=124.68,depth=[10,30])
30 xray.interface('iSi',sigma=5)
31 xray.substrate('Si',rho=20.07)
32 datax = refl.data('xray.dat')
33
34 neutron = refl.model('neutron')
35 neutron.incident('Air',rho=0)
36 neutron.interface('iAu',sigma='xray.iAu')
37 neutron.layer('Au',rho=4.66,depth='xray.Au.depth')
38 neutron.interface('iSi',sigma='xray.iSi')
39 neutron.substrate('Si',rho=2.07)
40 datan = refl.data('neutron.dat')
41
42 As you can see from the above, parameters can be set to a value if
43 the parameter is fixed, to a range if the parametemr is fitted, or
44 to a string expression if the parameter is calculated from other
45 parameters. See park.Parameter.set for further details.
46
47 Having constructed the models, we can now create an assembly::
48
49 import park
50 assembly = park.Assembly([(xray,datax), (neutron,datan)])
51
52 Note: this would normally be done in the context of a fit
53 using fit = park.Fit([(xray,datax), (neutron,datan)]), and later referenced
54 using fit.assembly.
55
56 Individual parts of the assembly are accessable using the
57 model number 0, 1, 2... or by the model name. In the above,
58 assembly[0] and assembly['xray'] refer to the same model.
59 Assemblies have insert and append functions for adding new
60 models, and "del model[idx]" for removing them.
61
62 Once the assembly is created computing the values for the system
63 is a matter of calling::
64
65 assembly.eval()
66 print "Chi**2",assembly.chisq
67 print "Reduced chi**2",assembly.chisq/assembly.degrees_of_freedom
68 plot(arange(len(assembly.residuals)), assembly.residuals)
69
70 This defines the attributes residuals, degrees_of_freedom and chisq,
71 which is what the optimizer uses as the cost function to minimize.
72
73 assembly.eval uses the current values for the parameters in the
74 individual models. These parameters can be changed directly
75 in the model. In the reflectometry example above, you could
76 set the gold thickness using xray.layer.Au.depth=156, or
77 something similar (the details are model specific). Parameters
78 can also be changed through the assembly parameter set. In the same
79 example, this would be assembly.parameterset['xray']['Au']['depth'].
80 See parameter set for details.
81
82 In the process of modeling data, particularly with multiple
83 datasets, you will sometimes want to temporarily ignore
84 how well one of the datasets matches so that you
85 can more quickly refine the model for the other datasets,
86 or see how particular models are influencing the fit. To
87 temporarily ignore the xray data in the example above use::
88
89 assembly.parts[0].isfitted = False
90
91 The model itself isn't ignored since its parameters may be
92 needed to compute the parameters for other models. To
93 reenable checking against the xray data, you would assign
94 a True value instead. More subtle weighting of the models
95 can be controlled using assembly.parts[idx].weight, but
96 see below for a note on model weighting.
97
98 A note on model weighting
99 -------------------------
100
101 Changing the weight is equivalent to scaling the error bars
102 on the given model by a factor of weight/n where n is the
103 number of data points. It is better to set the correct error
104 bars on your data in the first place than to adjust the weights.
105 If you have the correct error bars, then you should expect
106 roughly 2/3 of the data points to lie within one error bar of
107 the theory curve. If consecutive data points have largely
108 overlapping errorbars, then your uncertainty is overestimated.
109
110 Another case where weights are adjusted (abused?) is to
111 compensate for systematic errors in the data by forcing the
112 errorbars to be large enough to cover the systematic bias.
113 This is a poor approach to the problem. A better strategy
114 is to capture the systematic effects in the model, and treat
115 the measurement of the independent variable as an additional
116 data point in the fit. This is still not statistically sound
117 as there is likely to be a large correlation between the
118 uncertainty of the measurement and the values of all the
119 other variables.
120
121 That said, adjusting the weight on a dataset is a quick way
122 of reducing its influence on the entire fit. Please use it
123 with care.
124 """
125
126 __all__ = ['Assembly', 'Fitness']
127 import numpy
128
129 import park.modelling.parameter as parameter
130 import park.modelling.expression as expression
131 from park.modelling.data import Data1D
132 from park.fitting.fitresult import FitParameter
133
134
135
137 """
138 Container for theory and data.
139
140 The fit object compares theory with data.
141
142 TODO: what to do with fittable metadata (e.g., footprint correction)?
143 TODO: what results are returned to the application for plotting?
144 """
145
146
147
148
149
150 data = None
151 model = None
152 - def __init__(self, model=None,data=None):
153 """
154 Define a Fitness object based on theory and data.
155 """
156 if data == [] or data == None:
157 data = Data1D(x=numpy.array([],'d'),y=numpy.array([],'d'))
158 self.data,self.model = data,model
161 parameterset = property(_parameterset,doc="Fittable parameters")
164 derivs = property(_derivs,doc="Parameters with analytic derivatives")
166 """
167 Return residuals for current parameter values.
168 See `park.data.Data1D.residuals` for details.
169 """
170
171 return self.data.residuals(self.model.eval)
173 """
174 Return derivative of residuals with respect to the parameters.
175 See `park.data.Data1D.residuals_deriv` for details.
176 """
177 return self.data.residuals_deriv(self.model.eval_derivs,pars=pars)
178 - def set(self, **kw):
179 """
180 Set parameters in the model.
181
182 User convenience function. This allows a user with an assembly
183 of models in a script to for example set the fit range for
184 parameter 'a' of the model::
185 assembly[0].set(a=[5,6])
186
187 Raises KeyError if the parameter is not in parameterset.
188 """
189 self.model.set(**kw)
191 """
192 Abort execution of the model if possible.
193 """
194 if hasattr(self.model,'abort'): self.model.abort()
195
197 """
198 Part of a fitting assembly. Part holds the model itself and
199 associated data. The part can be initialized with a fitness
200 object or with a pair (model,data) for the default fitness function.
201
202 fitness (Fitness)
203 object implementing the `park.assembly.Fitness` interface. In
204 particular, fitness should provide a parameterset attribute
205 containing a ParameterSet and a residuals method returning a vector
206 of residuals.
207 weight (dimensionless)
208 weight for the model. See comments in assembly.py for details.
209 isfitted (boolean)
210 True if the model residuals should be included in the fit.
211 The model parameters may still be used in parameter
212 expressions, but there will be no comparison to the data.
213 residuals (vector)
214 Residuals for the model if they have been calculated, or None
215 degrees_of_freedom
216 Number of residuals minus number of fitted parameters.
217 Degrees of freedom for individual models does not make
218 sense in the presence of expressions combining models,
219 particularly in the case where a model has many parameters
220 but no data or many computed parameters. The degrees of
221 freedom for the model is set to be at least one.
222 chisq
223 sum(residuals**2); use chisq/degrees_of_freedom to
224 get the reduced chisq value.
225
226 Get/set the weight on the given model.
227
228 assembly.weight(3) returns the weight on model 3 (0-origin)
229 assembly.weight(3,0.5) sets the weight on model 3 (0-origin)
230 """
231
232 - def __init__(self, fitness, weight=1., isfitted=True):
241
242
244 """
245 Collection of fit models.
246
247 Assembly implements the `park.fit.Objective` interface.
248
249 See `park.assembly` for usage.
250
251 Instance variables:
252
253 residuals : array
254 a vector of residuals spanning all models, with model
255 weights applied as appropriate.
256 degrees_of_freedom : integer
257 length of the residuals - number of fitted parameters
258 chisq : float
259 sum squared residuals; this is not the reduced chisq, which
260 you can get using chisq/degrees_of_freedom
261
262 These fields are defined for the individual models as well, with
263 degrees of freedom adjusted to the length of the individual data
264 set. If the model is not fitted or the weight is zero, the residual
265 will not be calculated.
266
267 The residuals fields are available only after the model has been
268 evaluated.
269 """
270
272 """Build an assembly from a list of models."""
273 self.parts = []
274 for m in models:
275 self.parts.append(Part(m))
276 self._reset()
277
279 """Iterate through the models in order"""
280 for m in self.parts: yield m
281
283 """Return the nth model"""
284 return self.parts[n].fitness
285
287 """Replace the nth model"""
288 self.parts[n].fitness = fitness
289 self._reset()
290
292 """Delete the nth model"""
293 del self.parts[n]
294 self._reset()
295
296 - def weight(self, idx, value=None):
297 """
298 Query the weight on a particular model.
299
300 Set weight to value if value is supplied.
301
302 :Parameters:
303 idx : integer
304 model number
305 value : float
306 model weight
307 :return: model weight
308 """
309 if value is not None:
310 self.parts[i].weight = value
311 return self.parts[i].weight
312
314 """
315 Query if a particular model is fitted.
316
317 Set isfitted to value if value is supplied.
318
319 :param idx: model number
320 :type idx: integer
321 :param value:
322 """
323 if value is not None:
324 self.parts[i].isfitted = value
325 return self.parts[i].isfitted
326
327 - def append(self, fitness, weight=1.0, isfitted=True):
328 """
329 Add a model to the end of set.
330
331 :param fitness: the fitting model
332 The fitting model can be an instance of `park.assembly.Fitness`,
333 or a tuple of (`park.model.Model`,`park.data.Data1D`)
334 :param weight: model weighting (usually 1.0)
335 :param isfitted: whether model should be fit (equivalent to weight 0.)
336 """
337 self.parts.append(Part(fitness,weight,isfitted))
338 self._reset()
339
340 - def insert(self, idx, fitness, weight=1.0, isfitted=True):
341 """Add a model to a particular position in the set."""
342 self.parts.insert(idx,Part(fitness,weight,isfitted))
343 self._reset()
344
350
351
353 """
354 Recalculate the theory functions, and from them, the
355 residuals and chisq.
356
357 :note: Call this after the parameters have been updated.
358 """
359
360 self._cancel = False
361
362
363 self._fitexpression()
364
365
366 if not self.isfeasible(): return numpy.inf
367
368 chisq = 0
369 N_total = 0
370 k = len(self._fitparameters)
371 for m in self.parts:
372
373
374
375
376 self._current_model = m
377 if self._cancel: return numpy.inf
378 if m.isfitted:
379 m.residuals = m.fitness.residuals()
380 N = len(m.residuals)
381 N_total += N
382 m.degrees_of_freedom = N-k if N>k else 1
383 m.chisq = numpy.sum(m.residuals**2)
384 chisq += m.weight*m.chisq
385 self.degrees_of_freedom = N_total-k if N_total>k else 1
386 self.chisq = chisq
387 return self.chisq
388
390 """
391 Compute the combined residuals vector as needed.
392 """
393 resid = [m.weight*m.residuals
394 for m in self.parts
395 if m.isfitted]
396 return numpy.hstack(resid)
397 residuals=property(_residuals,doc="combined residuals from all parts")
398
400 """
401 Returns the derivative wrt the fit parameters at point p.
402
403 Numeric derivatives are calculated based on step, where step is
404 the portion of the total range for parameter j, or the portion of
405 point value p_j if the range on parameter j is infinite.
406 """
407
408 pvec = numpy.asarray(pvec)
409
410
411
412
413
414
415
416
417 range = zip(*[p.range for p in self._fitparameters])
418 lo,hi = [numpy.asarray(v) for v in range]
419 delta = (hi-lo)*step
420
421 idx = numpy.isinf(delta)
422
423 delta[idx] = pvec[idx]*step
424 delta[delta==0] = step
425
426
427 for k,v in enumerate(pvec):
428 self._fitparameters[k].value = v
429
430 r = []
431 for k,v in enumerate(pvec):
432
433
434 h = delta[k]
435 self._fitparameters[k].value = v + h
436 self.eval()
437 rk = self.residuals
438 self._fitparameters[k].value = v - h
439 self.eval()
440 rk -= self.residuals
441 self._fitparameters[k].value = v
442 r.append(rk/(2*h))
443
444 return numpy.vstack(r).T
445
446
447 - def cov(self, pvec):
448 """
449 Return the covariance matrix at p cov(f,p) = inv(J'J)
450 where J is the Jacobian matrix [df(xi)/dpj]
451
452 Using the singular value decomposition we have::
453
454 J = U S V'
455 J'J = (U S V')' (U S V')
456 = V S' U' U S V'
457 = V S S V'
458 inv(J'J) = inv(V S S V')
459 = inv(V') inv(S S) inv(V)
460 = V inv (S S) V'
461
462 Note that this function hangs unexpectedly for some versions
463 of numpy on Windows. Please make sure you are using the
464 latest version.
465 """
466
467
468
469 J = self.jacobian(pvec)
470 u,s,vh = numpy.linalg.svd(J,0)
471 JTJinv = numpy.dot(vh.T.conj()/s**2,vh)
472 return JTJinv
473
475 """
476 Return parameter uncertainty.
477
478 This is just the sqrt diagonal of covariance matrix inv(J'J) at point p.
479 """
480 return numpy.sqrt(numpy.diag(self.cov(pvec)))
481
483 """
484 Returns true if the parameter set is in a feasible region of the
485 modeling space.
486 """
487 return True
488
489
512
513 - def result(self, status='step'):
514 """
515 Details to send back to the fitting client on an improved fit.
516
517 status is 'start', 'step' or 'end' depending if this is the
518 first result to return, an improved result, or the final result.
519
520 [Not implemented]
521 """
522 return None
523
525 """
526 Cost function.
527
528 Evaluate the system for the parameter vector pvec, returning chisq
529 as the cost function to be minimized.
530
531 Raises a runtime error if the number of fit parameters is
532 different than the length of the vector.
533 """
534
535
536 pars = self._fitparameters
537 if len(pvec) != len(pars):
538 raise RuntimeError("Unexpected number of parameters")
539 for n,value in enumerate(pvec):
540 pars[n].value = value
541
542 chisq = self.eval()
543
544
545 restraints_penalty = 0
546 for p in self._restraints:
547 restraints_penalty += p.likelihood(p.value)
548
549 return self.chisq + restraints_penalty
550
552 """
553 Interrupt the current function evaluation.
554
555 Forward this to the currently executing model if possible.
556 """
557 self._cancel = True
558 if hasattr(self._current_model,'abort'):
559 self._current_model.abort()
560
561 import park
562 -class _Exp(park.Model):
563 """
564 Sample model for testing assembly.
565 """
566 parameters = ['a','c']
568 return self.a*numpy.exp(self.c*x)
570 parameters = ['a','c']
572
573 return self.a*x+self.c
575 """
576 Return an example assembly consisting of a pair of functions,
577 M1.a*exp(M1.c*x), M2.a*exp(2*M1.c*x)
578 and ideal data for
579 M1.a=1, M1.c=1.5, M2.a=2.5
580 """
581 import numpy
582 import park
583 from numpy import inf
584
585 x1 = numpy.linspace(0,1,11)
586 x2 = numpy.linspace(0,1,12)
587
588 if True:
589 y1,y2 = numpy.exp(1.5*x1),2.5*numpy.exp(3*x2)
590 M1 = _Exp('M1',a=[1,3],c=[1,3])
591 M2 = _Exp('M2',a=[1,3],c='2*M1.c')
592
593 else:
594 y1,y2 = x1+1.5, 2.5*x2+3
595 M1 = _Linear('M1',a=[1,3],c=[1,3])
596 M2 = _Linear('M2',a=[1,3],c='2*M1.c')
597 if False:
598 M1.a = [-inf,inf]
599 M1.c = [-inf,inf]
600 M2.a = [-inf,inf]
601 D1 = park.Data1D(x=x1, y=y1)
602 D2 = park.Data1D(x=x2, y=y2)
603
604 assembly = park.Assembly([(M1,D1),(M2,D2)])
605 return assembly
606
608 parameters = ['a','b','c','d','e']
610 return self.a*x**2+self.b*x+self.c + exp(self.d) - 3*sin(self.e)
611
613 import numpy
614 import park
615 from numpy import inf
616
617 x = numpy.linspace(0,1,11)
618
619 S = _Sphere(a=1,b=2,c=3,d=4,e=5)
620 y = S.eval(x1)
621 Sfit = _Sphere(a=[-inf,inf],b=[-inf,inf],c=[-inf,inf],d=[-inf,inf],e=[-inf,inf])
622 D = park.Data1D(x=x, y=y)
623
624 assembly = park.Assembly([(Sfit,D)])
625 return assembly
626
645
646 if __name__ == "__main__": test()
647