Package park :: Package modelling :: Module assembly

Source Code for Module park.modelling.assembly

  1  # This program is public domain
 
  2  """
 
  3  An assembly is a collection of fitting functions.  This provides
 
  4  the model representation that is the basis of the park fitting engine.
 
  5  
 
  6  Models can range from very simple one dimensional theory functions
 
  7  to complex assemblies of multidimensional datasets from different
 
  8  experimental techniques, each with their own theory function and
 
  9  a common underlying physical model.
 
 10  
 
 11  Usage
 
 12  =====
 
 13  
 
 14  First define the models you want to work with.  In the example
 
 15  below we will use an example of a simple multilayer system measured by
 
 16  specular reflection of xrays and neutrons.  The gold depth is the only
 
 17  fitting parameter, ranging from 10-30 A.  The interface depths are
 
 18  tied together using expressions.  In this case the expression is
 
 19  a simple copy, but any standard math functions can be used.  Some
 
 20  model developers may provide additional functions for use with the
 
 21  expression.
 
 22  
 
 23  Example models::
 
 24  
 
 25      import reflectometry.model1d as refl
 
 26      xray = refl.model('xray')
 
 27      xray.incident('Air',rho=0)
 
 28      xray.interface('iAu',sigma=5)
 
 29      xray.layer('Au',rho=124.68,depth=[10,30])
 
 30      xray.interface('iSi',sigma=5)
 
 31      xray.substrate('Si',rho=20.07)
 
 32      datax = refl.data('xray.dat')
 
 33  
 
 34      neutron = refl.model('neutron')
 
 35      neutron.incident('Air',rho=0)
 
 36      neutron.interface('iAu',sigma='xray.iAu')
 
 37      neutron.layer('Au',rho=4.66,depth='xray.Au.depth')
 
 38      neutron.interface('iSi',sigma='xray.iSi')
 
 39      neutron.substrate('Si',rho=2.07)
 
 40      datan = refl.data('neutron.dat')
 
 41  
 
 42  As you can see from the above, parameters can be set to a value if
 
 43  the parameter is fixed, to a range if the parametemr is fitted, or
 
 44  to a string expression if the parameter is calculated from other
 
 45  parameters.  See park.Parameter.set for further details.
 
 46  
 
 47  Having constructed the models, we can now create an assembly::
 
 48  
 
 49      import park
 
 50      assembly = park.Assembly([(xray,datax), (neutron,datan)])
 
 51  
 
 52  Note: this would normally be done in the context of a fit
 
 53  using fit = park.Fit([(xray,datax), (neutron,datan)]), and later referenced
 
 54  using fit.assembly.
 
 55  
 
 56  Individual parts of the assembly are accessable using the
 
 57  model number 0, 1, 2... or by the model name.  In the above,
 
 58  assembly[0] and assembly['xray'] refer to the same model.
 
 59  Assemblies have insert and append functions for adding new
 
 60  models, and "del model[idx]" for removing them.
 
 61  
 
 62  Once the assembly is created computing the values for the system
 
 63  is a matter of calling::
 
 64  
 
 65      assembly.eval()
 
 66      print "Chi**2",assembly.chisq
 
 67      print "Reduced chi**2",assembly.chisq/assembly.degrees_of_freedom
 
 68      plot(arange(len(assembly.residuals)), assembly.residuals)
 
 69  
 
 70  This defines the attributes residuals, degrees_of_freedom and chisq,
 
 71  which is what the optimizer uses as the cost function to minimize.
 
 72  
 
 73  assembly.eval uses the current values for the parameters in the
 
 74  individual models.  These parameters can be changed directly
 
 75  in the model.  In the reflectometry example above, you could
 
 76  set the gold thickness using xray.layer.Au.depth=156, or
 
 77  something similar (the details are model specific).  Parameters
 
 78  can also be changed through the assembly parameter set.  In the same
 
 79  example, this would be assembly.parameterset['xray']['Au']['depth'].
 
 80  See parameter set for details.
 
 81  
 
 82  In the process of modeling data, particularly with multiple
 
 83  datasets, you will sometimes want to temporarily ignore
 
 84  how well one of the datasets matches so that you
 
 85  can more quickly refine the model for the other datasets,
 
 86  or see how particular models are influencing the fit.  To
 
 87  temporarily ignore the xray data in the example above use::
 
 88  
 
 89      assembly.parts[0].isfitted = False
 
 90  
 
 91  The model itself isn't ignored since its parameters may be
 
 92  needed to compute the parameters for other models.  To
 
 93  reenable checking against the xray data, you would assign
 
 94  a True value instead.  More subtle weighting of the models
 
 95  can be controlled using assembly.parts[idx].weight, but
 
 96  see below for a note on model weighting.
 
 97  
 
 98  A note on model weighting
 
 99  -------------------------
 
100  
 
101  Changing the weight is equivalent to scaling the error bars
 
102  on the given model by a factor of weight/n where n is the
 
103  number of data points.  It is better to set the correct error
 
104  bars on your data in the first place than to adjust the weights.
 
105  If you have the correct error bars, then you should expect
 
106  roughly 2/3 of the data points to lie within one error bar of
 
107  the theory curve.  If consecutive data points have largely
 
108  overlapping errorbars, then your uncertainty is overestimated.
 
109  
 
110  Another case where weights are adjusted (abused?) is to
 
111  compensate for systematic errors in the data by forcing the
 
112  errorbars to be large enough to cover the systematic bias.
 
113  This is a poor approach to the problem.  A better strategy
 
114  is to capture the systematic effects in the model, and treat
 
115  the measurement of the independent variable as an additional
 
116  data point in the fit.  This is still not statistically sound
 
117  as there is likely to be a large correlation between the
 
118  uncertainty of the measurement and the values of all the
 
119  other variables.
 
120  
 
121  That said, adjusting the weight on a dataset is a quick way
 
122  of reducing its influence on the entire fit.  Please use it
 
123  with care.
 
124  """ 
125  
 
126  __all__ = ['Assembly', 'Fitness'] 
127  import numpy 
128  
 
129  import park.modelling.parameter as parameter 
130  import park.modelling.expression as expression 
131  from park.modelling.data import Data1D 
132  from park.fitting.fitresult import FitParameter 
133  
 
134  
 
135  
 
136 -class Fitness(object):
137 """ 138 Container for theory and data. 139 140 The fit object compares theory with data. 141 142 TODO: what to do with fittable metadata (e.g., footprint correction)? 143 TODO: what results are returned to the application for plotting? 144 """ 145 # Could associate optional parameterset with data and combine them with 146 # the model parameters when returning the fitness parameterset. 147 # Alternatively, model must be able to apply the inverse operation to 148 # the theory calculation to best match the data. The former is more 149 # meaningful to the user; the latter is more honest. 150 data = None 151 model = None
152 - def __init__(self, model=None,data=None):
153 """ 154 Define a Fitness object based on theory and data. 155 """ 156 if data == [] or data == None: 157 data = Data1D(x=numpy.array([],'d'),y=numpy.array([],'d')) 158 self.data,self.model = data,model
159 - def _parameterset(self):
160 return self.model.parameterset
161 parameterset = property(_parameterset,doc="Fittable parameters")
162 - def _derivs(self):
163 return self.model.derivs
164 derivs = property(_derivs,doc="Parameters with analytic derivatives")
165 - def residuals(self):
166 """ 167 Return residuals for current parameter values. 168 See `park.data.Data1D.residuals` for details. 169 """ 170 #self.data.residuals(self.model.eval) 171 return self.data.residuals(self.model.eval)
172 - def residuals_deriv(self, pars=[]):
173 """ 174 Return derivative of residuals with respect to the parameters. 175 See `park.data.Data1D.residuals_deriv` for details. 176 """ 177 return self.data.residuals_deriv(self.model.eval_derivs,pars=pars)
178 - def set(self, **kw):
179 """ 180 Set parameters in the model. 181 182 User convenience function. This allows a user with an assembly 183 of models in a script to for example set the fit range for 184 parameter 'a' of the model:: 185 assembly[0].set(a=[5,6]) 186 187 Raises KeyError if the parameter is not in parameterset. 188 """ 189 self.model.set(**kw)
190 - def abort(self):
191 """ 192 Abort execution of the model if possible. 193 """ 194 if hasattr(self.model,'abort'): self.model.abort()
195
196 -class Part(object):
197 """ 198 Part of a fitting assembly. Part holds the model itself and 199 associated data. The part can be initialized with a fitness 200 object or with a pair (model,data) for the default fitness function. 201 202 fitness (Fitness) 203 object implementing the `park.assembly.Fitness` interface. In 204 particular, fitness should provide a parameterset attribute 205 containing a ParameterSet and a residuals method returning a vector 206 of residuals. 207 weight (dimensionless) 208 weight for the model. See comments in assembly.py for details. 209 isfitted (boolean) 210 True if the model residuals should be included in the fit. 211 The model parameters may still be used in parameter 212 expressions, but there will be no comparison to the data. 213 residuals (vector) 214 Residuals for the model if they have been calculated, or None 215 degrees_of_freedom 216 Number of residuals minus number of fitted parameters. 217 Degrees of freedom for individual models does not make 218 sense in the presence of expressions combining models, 219 particularly in the case where a model has many parameters 220 but no data or many computed parameters. The degrees of 221 freedom for the model is set to be at least one. 222 chisq 223 sum(residuals**2); use chisq/degrees_of_freedom to 224 get the reduced chisq value. 225 226 Get/set the weight on the given model. 227 228 assembly.weight(3) returns the weight on model 3 (0-origin) 229 assembly.weight(3,0.5) sets the weight on model 3 (0-origin) 230 """ 231
232 - def __init__(self, fitness, weight=1., isfitted=True):
233 if isinstance(fitness, tuple): 234 fitness = park.Fitness(*fitness) 235 self.fitness = fitness 236 self.weight = weight 237 self.isfitted = isfitted 238 self.residuals = None 239 self.chisq = numpy.Inf 240 self.degrees_of_freedom = 1
241 242
243 -class Assembly(object):
244 """ 245 Collection of fit models. 246 247 Assembly implements the `park.fit.Objective` interface. 248 249 See `park.assembly` for usage. 250 251 Instance variables: 252 253 residuals : array 254 a vector of residuals spanning all models, with model 255 weights applied as appropriate. 256 degrees_of_freedom : integer 257 length of the residuals - number of fitted parameters 258 chisq : float 259 sum squared residuals; this is not the reduced chisq, which 260 you can get using chisq/degrees_of_freedom 261 262 These fields are defined for the individual models as well, with 263 degrees of freedom adjusted to the length of the individual data 264 set. If the model is not fitted or the weight is zero, the residual 265 will not be calculated. 266 267 The residuals fields are available only after the model has been 268 evaluated. 269 """ 270
271 - def __init__(self, models=[]):
272 """Build an assembly from a list of models.""" 273 self.parts = [] 274 for m in models: 275 self.parts.append(Part(m)) 276 self._reset()
277
278 - def __iter__(self):
279 """Iterate through the models in order""" 280 for m in self.parts: yield m
281
282 - def __getitem__(self, n):
283 """Return the nth model""" 284 return self.parts[n].fitness
285
286 - def __setitem__(self, n, fitness):
287 """Replace the nth model""" 288 self.parts[n].fitness = fitness 289 self._reset()
290
291 - def __delitem__(self, n):
292 """Delete the nth model""" 293 del self.parts[n] 294 self._reset()
295
296 - def weight(self, idx, value=None):
297 """ 298 Query the weight on a particular model. 299 300 Set weight to value if value is supplied. 301 302 :Parameters: 303 idx : integer 304 model number 305 value : float 306 model weight 307 :return: model weight 308 """ 309 if value is not None: 310 self.parts[i].weight = value 311 return self.parts[i].weight
312
313 - def isfitted(self, idx, value=None):
314 """ 315 Query if a particular model is fitted. 316 317 Set isfitted to value if value is supplied. 318 319 :param idx: model number 320 :type idx: integer 321 :param value: 322 """ 323 if value is not None: 324 self.parts[i].isfitted = value 325 return self.parts[i].isfitted
326
327 - def append(self, fitness, weight=1.0, isfitted=True):
328 """ 329 Add a model to the end of set. 330 331 :param fitness: the fitting model 332 The fitting model can be an instance of `park.assembly.Fitness`, 333 or a tuple of (`park.model.Model`,`park.data.Data1D`) 334 :param weight: model weighting (usually 1.0) 335 :param isfitted: whether model should be fit (equivalent to weight 0.) 336 """ 337 self.parts.append(Part(fitness,weight,isfitted)) 338 self._reset()
339
340 - def insert(self, idx, fitness, weight=1.0, isfitted=True):
341 """Add a model to a particular position in the set.""" 342 self.parts.insert(idx,Part(fitness,weight,isfitted)) 343 self._reset()
344
345 - def _reset(self):
346 """Adjust the parameter set after the addition of a new model.""" 347 subsets = [m.fitness.parameterset for m in self] 348 self.parameterset = parameter.ParameterSet('root',subsets) 349 parameter.setprefix(self.parameterset)
350 #print [p.path for p in parameter.flatten(self.parameterset)] 351
352 - def eval(self):
353 """ 354 Recalculate the theory functions, and from them, the 355 residuals and chisq. 356 357 :note: Call this after the parameters have been updated. 358 """ 359 # Handle abort from a separate thread. 360 self._cancel = False 361 362 # Evaluate the computed parameters 363 self._fitexpression() 364 365 # Check that the resulting parameters are in a feasible region. 366 if not self.isfeasible(): return numpy.inf 367 368 chisq = 0 369 N_total = 0 370 k = len(self._fitparameters) 371 for m in self.parts: 372 # In order to support abort, need to be able to propagate an 373 # external abort signal from self.abort() into an abort signal 374 # for the particular model. Can't see a way to do this which 375 # doesn't involve setting a state variable. 376 self._current_model = m 377 if self._cancel: return numpy.inf 378 if m.isfitted: 379 m.residuals = m.fitness.residuals() 380 N = len(m.residuals) 381 N_total += N 382 m.degrees_of_freedom = N-k if N>k else 1 383 m.chisq = numpy.sum(m.residuals**2) 384 chisq += m.weight*m.chisq 385 self.degrees_of_freedom = N_total-k if N_total>k else 1 386 self.chisq = chisq 387 return self.chisq
388
389 - def _residuals(self):
390 """ 391 Compute the combined residuals vector as needed. 392 """ 393 resid = [m.weight*m.residuals 394 for m in self.parts 395 if m.isfitted] 396 return numpy.hstack(resid)
397 residuals=property(_residuals,doc="combined residuals from all parts") 398
399 - def jacobian(self, pvec, step=1e-8):
400 """ 401 Returns the derivative wrt the fit parameters at point p. 402 403 Numeric derivatives are calculated based on step, where step is 404 the portion of the total range for parameter j, or the portion of 405 point value p_j if the range on parameter j is infinite. 406 """ 407 # Make sure the input vector is an array 408 pvec = numpy.asarray(pvec) 409 # We are being lazy here. We can precompute the bounds, we can 410 # use the residuals_deriv from the sub-models which have analytic 411 # derivatives and we need only recompute the models which depend 412 # on the varying parameters. 413 # Meanwhile, let's compute the numeric derivative using the 414 # three point formula. 415 # We are not checking that the varied parameter in numeric 416 # differentiation is indeed feasible in the interval of interest. 417 range = zip(*[p.range for p in self._fitparameters]) 418 lo,hi = [numpy.asarray(v) for v in range] 419 delta = (hi-lo)*step 420 # For infinite ranges, use p*1e-8 for the step size 421 idx = numpy.isinf(delta) 422 #print "J",idx,delta,pvec,type(idx),type(delta),type(pvec) 423 delta[idx] = pvec[idx]*step 424 delta[delta==0] = step 425 426 # Set the initial value 427 for k,v in enumerate(pvec): 428 self._fitparameters[k].value = v 429 # Gather the residuals 430 r = [] 431 for k,v in enumerate(pvec): 432 # Center point formula: 433 # df/dv = lim_{h->0} ( f(v+h)-f(v-h) ) / ( 2h ) 434 h = delta[k] 435 self._fitparameters[k].value = v + h 436 self.eval() 437 rk = self.residuals 438 self._fitparameters[k].value = v - h 439 self.eval() 440 rk -= self.residuals 441 self._fitparameters[k].value = v 442 r.append(rk/(2*h)) 443 # return the jacobian 444 return numpy.vstack(r).T
445 446
447 - def cov(self, pvec):
448 """ 449 Return the covariance matrix at p cov(f,p) = inv(J'J) 450 where J is the Jacobian matrix [df(xi)/dpj] 451 452 Using the singular value decomposition we have:: 453 454 J = U S V' 455 J'J = (U S V')' (U S V') 456 = V S' U' U S V' 457 = V S S V' 458 inv(J'J) = inv(V S S V') 459 = inv(V') inv(S S) inv(V) 460 = V inv (S S) V' 461 462 Note that this function hangs unexpectedly for some versions 463 of numpy on Windows. Please make sure you are using the 464 latest version. 465 """ 466 # The singular value matrix S is diagonal, so inv(S**2) is 1/S**2 467 # using element-wise division. In fact, SVD returns S as a vector, 468 # so element-wise V/S**2 is equivalent to matrix V dot inv(S**2). 469 J = self.jacobian(pvec) 470 u,s,vh = numpy.linalg.svd(J,0) 471 JTJinv = numpy.dot(vh.T.conj()/s**2,vh) 472 return JTJinv
473
474 - def stderr(self, pvec):
475 """ 476 Return parameter uncertainty. 477 478 This is just the sqrt diagonal of covariance matrix inv(J'J) at point p. 479 """ 480 return numpy.sqrt(numpy.diag(self.cov(pvec)))
481
482 - def isfeasible(self):
483 """ 484 Returns true if the parameter set is in a feasible region of the 485 modeling space. 486 """ 487 return True
488 489 # Fitting service interface
490 - def fit_parameters(self):
491 """ 492 Return an alphabetical list of the fitting parameters. 493 494 This function is called once at the beginning of a fit, 495 and serves as a convenient place to precalculate what 496 can be precalculated such as the set of fitting parameters 497 and the parameter expressions evaluator. 498 """ 499 parameter.setprefix(self.parameterset) 500 self._fitparameters = parameter.fitted(self.parameterset) 501 self._restraints = parameter.restrained(self.parameterset) 502 pars = parameter.flatten(self.parameterset) 503 context = parameter.gather_context(self.parameterset) 504 self._fitexpression = expression.build_eval(pars,context) 505 #print "constraints",self._fitexpression.__doc__ 506 507 self._fitparameters.sort(lambda a,b: cmp(a.path,b.path)) 508 # Convert to fitparameter a object 509 fitpars = [FitParameter(p.path,p.range,p.value) 510 for p in self._fitparameters] 511 return fitpars
512
513 - def result(self, status='step'):
514 """ 515 Details to send back to the fitting client on an improved fit. 516 517 status is 'start', 'step' or 'end' depending if this is the 518 first result to return, an improved result, or the final result. 519 520 [Not implemented] 521 """ 522 return None
523
524 - def __call__(self, pvec):
525 """ 526 Cost function. 527 528 Evaluate the system for the parameter vector pvec, returning chisq 529 as the cost function to be minimized. 530 531 Raises a runtime error if the number of fit parameters is 532 different than the length of the vector. 533 """ 534 # Plug fit parameters into model 535 #print "Trying",pvec 536 pars = self._fitparameters 537 if len(pvec) != len(pars): 538 raise RuntimeError("Unexpected number of parameters") 539 for n,value in enumerate(pvec): 540 pars[n].value = value 541 # Evaluate model 542 chisq = self.eval() 543 # Evaluate additional restraints based on parameter value 544 # likelihood 545 restraints_penalty = 0 546 for p in self._restraints: 547 restraints_penalty += p.likelihood(p.value) 548 # Return total cost function 549 return self.chisq + restraints_penalty
550
551 - def abort(self):
552 """ 553 Interrupt the current function evaluation. 554 555 Forward this to the currently executing model if possible. 556 """ 557 self._cancel = True 558 if hasattr(self._current_model,'abort'): 559 self._current_model.abort()
560 561 import park
562 -class _Exp(park.Model):
563 """ 564 Sample model for testing assembly. 565 """ 566 parameters = ['a','c']
567 - def eval(self,x):
568 return self.a*numpy.exp(self.c*x)
569 -class _Linear(park.Model):
570 parameters = ['a','c']
571 - def eval(self,x):
572 #print "eval",self.a,self.c,x,self.a*x+self.c 573 return self.a*x+self.c
574 -def example():
575 """ 576 Return an example assembly consisting of a pair of functions, 577 M1.a*exp(M1.c*x), M2.a*exp(2*M1.c*x) 578 and ideal data for 579 M1.a=1, M1.c=1.5, M2.a=2.5 580 """ 581 import numpy 582 import park 583 from numpy import inf 584 # Make some fake data 585 x1 = numpy.linspace(0,1,11) 586 x2 = numpy.linspace(0,1,12) 587 # Define a shared model 588 if True: # Exp model 589 y1,y2 = numpy.exp(1.5*x1),2.5*numpy.exp(3*x2) 590 M1 = _Exp('M1',a=[1,3],c=[1,3]) 591 M2 = _Exp('M2',a=[1,3],c='2*M1.c') 592 #M2 = _Exp('M2',a=[1,3],c=3) 593 else: # Linear model 594 y1,y2 = x1+1.5, 2.5*x2+3 595 M1 = _Linear('M1',a=[1,3],c=[1,3]) 596 M2 = _Linear('M2',a=[1,3],c='2*M1.c') 597 if False: # Unbounded 598 M1.a = [-inf,inf] 599 M1.c = [-inf,inf] 600 M2.a = [-inf,inf] 601 D1 = park.Data1D(x=x1, y=y1) 602 D2 = park.Data1D(x=x2, y=y2) 603 # Construct the assembly 604 assembly = park.Assembly([(M1,D1),(M2,D2)]) 605 return assembly
606
607 -class _Sphere(park.Model):
608 parameters = ['a','b','c','d','e']
609 - def eval(self,x):
610 return self.a*x**2+self.b*x+self.c + exp(self.d) - 3*sin(self.e)
611
612 -def example5():
613 import numpy 614 import park 615 from numpy import inf 616 # Make some fake data 617 x = numpy.linspace(0,1,11) 618 # Define a shared model 619 S = _Sphere(a=1,b=2,c=3,d=4,e=5) 620 y = S.eval(x1) 621 Sfit = _Sphere(a=[-inf,inf],b=[-inf,inf],c=[-inf,inf],d=[-inf,inf],e=[-inf,inf]) 622 D = park.Data1D(x=x, y=y) 623 # Construct the assembly 624 assembly = park.Assembly([(Sfit,D)]) 625 return assembly
626
627 -def test():
628 assembly = example() 629 assert assembly[0].parameterset.name == 'M1' 630 631 # extract the fitting parameters 632 pars = [p.name for p in assembly.fit_parameters()] 633 assert set(pars) == set(['M1.a','M1.c','M2.a']) 634 # Compute chisq and verify constraints are updated properly 635 assert assembly([1,1.5,2.5]) == 0 636 assert assembly[0].model.c == 1.5 and assembly[1].model.c == 3 637 638 # Try without constraints 639 assembly[1].set(c=3) 640 assembly.fit_parameters() # Fit parameters have changed 641 assert assembly([1,1.5,2.5]) == 0 642 643 # Check that assembly.cov runs ... still need to check that it is correct! 644 C = assembly.cov(numpy.array([1,1.5,2.5]))
645 646 if __name__ == "__main__": test() 647