Package park :: Package modelling :: Module data

Source Code for Module park.modelling.data

  1  # This program is public domain 
  2  """ 
  3  Park 1-D data objects. 
  4   
  5  The class Data1D represents simple 1-D data objects, along with 
  6  an ascii file loader.  This format will work well for many 
  7  uses, but it is likely that more specialized models will have 
  8  their own data file formats. 
  9   
 10  The minimal data format for park must supply the following 
 11  methods: 
 12   
 13      residuals(fn) 
 14          returns the residuals vector for the model function. 
 15      residuals_deriv(fn_deriv,par) 
 16          returns the residuals vector for the model function, and 
 17          for the derivatives with respect to the given parameters. 
 18   
 19  The function passed is going to be model.eval or in the case 
 20  where derivatives are available, model.eval_deriv.  Normally 
 21  this will take a vector of dependent variables and return the 
 22  theory function for that vector but this is only convention. 
 23  The fitting service only uses the parameter set and the residuals 
 24  method from the model. 
 25   
 26  Model and data are combined to make a `park.assembly.Fitness` 
 27  object, which defines the complete interface required to match 
 28  model and data.  This may be a more convenient level to integrate 
 29  with park. 
 30   
 31  The park GUI will make more demands on the interface, but the 
 32  details are not yet resolved. 
 33  """ 
 34  from __future__ import with_statement  # Used only in test() 
 35   
 36  __all__ = ['Data1D'] 
 37   
 38  import numpy 
 39   
 40  try: 
 41      from park.modelling._modelling import convolve as _convolve 
42 - def convolve(xi,yi,x,dx):
43 """ 44 Return convolution y of width dx at points x based on the 45 sampled input function yi = f(xi). 46 """ 47 y = numpy.empty(x.shape,'d') 48 xi,yi,x,dx = [numpy.ascontiguousarray(v,'d') for v in xi,yi,x,dx] 49 _convolve(xi,yi,x,dx,y) 50 return y
51 except:
52 - def convolve(*args,**kw):
53 """ 54 Return convolution y of width dx at points x based on the 55 sampled input function yi = f(xi). 56 57 Note: C version is not available in this build, and python 58 version is not implemented. 59 """ 60 raise NotImplementedError("convolve is a compiled function")
61 62 63 __all__ = ['Data1D'] 64
65 -class Data1D(object):
66 """ 67 Data representation for 1-D fitting. 68 69 Attributes 70 ========== 71 72 filename 73 The source of the data. This may be the empty string if the 74 data is simulation data. 75 x,y,dy 76 The data values. 77 x is the measurement points of data to be fitted. x must be sorted. 78 y is the measured value 79 dy is the measurement uncertainty. 80 dx 81 Resolution at the the measured points. The resolution may be 0, 82 constant, or defined for each data point. dx is the 1-sigma 83 width of the Gaussian resolution function at point x. Note that 84 dx_FWHM = sqrt(8 ln 2) dx_sigma, so scale dx appropriately. 85 86 87 fit_x,fit_dx,fit_y,fit_dy 88 The points used in evaluating the residuals. 89 calc_x 90 The points at which to evaluate the theory function. This may be 91 different from the measured points for a number of reasons, such 92 as a resolution function which suggests over or under sampling of 93 the points (see below). By default calc_x is x, but it can be set 94 explicitly by the user. 95 calc_y, fx 96 The value of the function at the theory points, and the value of 97 the function after resolution has been applied. These values are 98 computed by a call to residuals. 99 100 Notes on calc_x 101 =============== 102 103 The contribution of Q to a resolution of width dQo at point Qo is:: 104 105 p(Q) = 1/sqrt(2 pi dQo**2) exp ( (Q-Qo)**2/(2 dQo**2) ) 106 107 We are approximating the convolution at Qo using a numerical 108 approximation to the integral over the measured points, with the 109 integral is limited to p(Q_i)/p(0)>=0.001. 110 111 Sometimes the function we are convoluting is rapidly changing. 112 That means the correct convolution should uniformly sample across 113 the entire width of the Gaussian. This is not possible at the 114 end points unless you calculate the theory function beyond what is 115 strictly needed for the data. For a given dQ and step size, 116 you need enough steps that the following is true:: 117 118 (n*step)**2 > -2 dQ**2 * ln 0.001 119 120 The choice of sampling density is particularly important near 121 critical points where the shape of the function changes. In 122 reflectometry, the function goes from flat below the critical 123 edge to O(Q**4) above. In one particular model, calculating every 124 0.005 rather than every 0.02 changed a value above the critical 125 edge by 15%. In a fitting program, this would lead to a somewhat 126 larger estimate of the critical edge for this sample. 127 128 Sometimes the theory function is oscillating more rapidly than 129 the instrument can resolve. This happens for example in reflectivity 130 measurements involving thick layers. In these systems, the theory 131 function should be oversampled around the measured points Q. With 132 a single thick layer, oversampling can be limited to just one 133 period 2 pi/d. With multiple thick layers, oscillations will 134 show interference patterns and it will be necessary to oversample 135 uniformly through the entire width of the resolution. If this is 136 not accommodated, then aliasing effects make it difficult to 137 compute the correct model. 138 """ 139 filename = "" 140 x,y = None,None 141 dx,dy = 0,1 142 calc_x,calc_y = None,None 143 fit_x,fit_y = None,None 144 fit_dx,fit_dy = 0,1 145 fx = None 146
147 - def __init__(self,filename="",x=None,y=None,dx=0,dy=1):
148 """ 149 Define the fitting data. 150 151 Data can be loaded from a file using filename or 152 specified directly using x,y,dx,dy. File loading 153 happens after assignment of x,y,dx,dy. 154 """ 155 self.x,self.y,self.dx,self.dy = x,y,dx,dy 156 self.select(None) 157 if filename: self.load(filename)
158
159 - def resample(self,minstep=None):
160 """ 161 Over/under sampling support. 162 163 Compute the calc_x points required to adequately sample 164 the function y=f(x) so that the value reported for each 165 measured point is supported by the resolution dx at x. 166 minstep is the minimum allowed sampling density that 167 should be used. 168 """ 169 self.calc_x = resample(self.x,self.dx,minstep)
170
171 - def load(self, filename, **kw):
172 """ 173 Load a multicolumn datafile. 174 175 Data should be in columns, with the following defaults:: 176 177 x,y or x,y,dy or x,dx,y,dy 178 179 Note that this resets the selected fitting points calc_x and the 180 computed results calc_y and fx. 181 182 Data is sorted after loading. 183 184 Any extra keyword arguments are passed to the numpy loadtxt 185 function. This allows you to select the columns you want, 186 skip rows, set the column separator, change the comment character, 187 amongst other things. 188 """ 189 self.dx,self.dy = 0,1 190 self.calc_x,self.calc_y,self.fx = None,None,None 191 if filename: 192 columns = numpy.loadtxt(filename, **kw) 193 self.x = columns[:,0] 194 if columns.shape[1]==4: 195 self.dx = columns[:,1] 196 self.y = columns[:,2] 197 self.dy = columns[:,3] 198 elif columns.shape[1]==3: 199 self.dx = 0 200 self.y = columns[:,1] 201 self.dy = columns[:,2] 202 elif columns.shape[1]==2: 203 self.dx = 0 204 self.y = columns[:,1] 205 self.dy = 1 206 else: 207 raise IOError,"Unexpected number of columns in "+filename 208 idx = numpy.argsort(self.x) 209 self.x,self.y = self.x[idx],self.y[idx] 210 if not numpy.isscalar(self.dx): self.dx = self.dx[idx] 211 if not numpy.isscalar(self.dy): self.dy = self.dy[idx] 212 else: 213 self.x,self.dx,self.y,self.dy = None,None,None,None 214 215 self.filename = filename 216 self.select(None)
217
218 - def select(self, idx):
219 """ 220 A selection vector for points to use in the evaluation of the 221 residuals, or None if all points are to be used. 222 """ 223 if idx is not None: 224 self.fit_x = self.x[idx] 225 self.fit_y = self.y[idx] 226 if not numpy.isscalar(self.dx): self.fit_dx = self.dx[idx] 227 if not numpy.isscalar(self.dy): self.fit_dy = self.dy[idx] 228 else: 229 self.fit_x = self.x 230 self.fit_dx = self.dx 231 self.fit_y = self.y 232 self.fit_dy = self.dy
233
234 - def residuals(self, fn):
235 """ 236 Compute the residuals of the data wrt to the given function. 237 238 Returns R = (y - f(x;p)) / sigma_y 239 240 y = fn(x) should be a callable accepting a list of points at which 241 to calculate the function, returning a vector of values at those 242 points. 243 244 Any resolution function will be applied after the theory points 245 are calculated. To suppress the resolution calculation, set 246 fit_dx to 0. 247 """ 248 calc_x = self.calc_x if self.calc_x else self.fit_x 249 self.calc_y = fn(calc_x) 250 self.fx = resolution(calc_x,self.calc_y,self.fit_x,self.fit_dx) 251 return (self.fit_y - self.fx)/self.fit_dy
252
253 - def residuals_deriv(self, fn, pars=[]):
254 """ 255 Compute residuals and derivatives wrt the given parameters. 256 257 fdf = fn(x,pars=pars) should be a callable accepting a list 258 of points at which to calculate the function and a keyword 259 argument listing the parameters for which the derivative will 260 be calculated. 261 262 Returns a list of the residuals and the derivative wrt the 263 parameter for each parameter:: 264 265 R = (y-f(x;p)) / sigma_y 266 dR/dp1 = -1/sigma_y df(x;p)/dp1 267 dR/dp2 = -1/sigma_y df(x;p)/dp2 268 ... 269 270 The fitness function is sum(R**2) and its derivative wrt pi 271 is 2*sum(R*df/dpi). 272 273 Any resolution function will be applied after the theory points 274 and derivatives are calculated. To suppress the resolution 275 calculation, set fit_dx to 0. 276 277 Note that we can apply the resolution to the analytic derivatives 278 rather than computing them numerically. The resolution calculation 279 is a convolution of a function f(x) with a distribution of possible 280 inputs G(x). For reasonable function G,f the derivative of the 281 convolution is the convolution of the derivative. This can be 282 seen from the following equations:: 283 284 d/dp G(x) * f(x;p) = d/dp \int G(z-x) f(z;p) dz 285 = \int d/dp G(z-x) f(z;p) dz 286 = \int G(z-x) df(z;p)/dp dz 287 = G(x) * df(x;p)/dp 288 289 Keep in mind that the convolution is with respect to x and the 290 derivative is with respect to p. 291 """ 292 calc_x = self.calc_x if self.calc_x else self.fit_x 293 294 # Compute f and derivatives 295 fdf = fn(calc_x,pars=pars) 296 self.calc_y = fdf[0] 297 298 # Apply resolution 299 fdf = [resolution(calc_x,y,self.fit_x,self.fit_dx) for y in fdf] 300 self.fx = fdf[0] 301 R = (self.fit_y-self.fx)/self.fit_dy 302 raise RuntimeError('check whether we want df/dp or dR/dp where R=residuals^2') 303 304 # R = (y - F(x;p))/sigma => dR/dp = -1/sigma dF(x;p)/dp 305 df = [ -df/self.fit_dy for df in fdf_res[1:] ] 306 307 return [R]+df
308
309 -def equal_spaced(x,tol=1e-5):
310 """ 311 Return true if data is regularly spaced within tolerance. Tolerance 312 uses relative error. 313 314 This function is not used. 315 """ 316 step = numpy.diff(x) 317 step_bar = numpy.mean(step) 318 return (abs(step-step_bar) < tol*step_bar).all()
319
320 -def resample(x,dx,minstep):
321 """ 322 Defining the minimum support basis. 323 324 Compute the calc_x points required to adequately sample 325 the function y=f(x) so that the value reported for each 326 measured point is supported by the resolution dx at x. 327 minstep is the minimum allowed sampling density that should be used. 328 """ 329 raise NotImplementedError
330
331 -def resolution(calcx,calcy,fitx,fitdx):
332 """ 333 Apply resolution function. If there is no resolution function, then 334 interpolate from the calculated points to the desired theory points. 335 If the data are irregular, use a brute force convolution function. 336 337 If the data are regular and the resolution is fixed, then you can 338 deconvolute the data before fitting, saving time and complexity. 339 """ 340 if numpy.any(fitdx != 0): 341 if numpy.isscalar(fitdx): 342 fitdx = numpy.ones(fitx.shape)*fitdx 343 fx = convolve(calcx, calcy, fitx, fitdx) 344 elif calcx is fitx: 345 fx = calcy 346 else: 347 fx = numpy.interp(fitx,calcx,calcy) 348 return fx
349 350 351 # ===== Test code =====
352 -class TempData:
353 """ 354 Create a temporary file with a given data set and remove it when done. 355 356 Example:: 357 358 from __future__ import with_statement 359 ... 360 with TempData("# x y dy\n1 2 3\n4 5 6") as filename: 361 # run tests of loading from filename 362 363 This class is useful for testing data file readers. 364 """
365 - def __init__(self,data,suffix='.dat',prefix=''):
366 self.data = data 367 self.prefix = prefix 368 self.suffix = suffix
369 - def __enter__(self):
370 import os,tempfile 371 fid,self.filename = tempfile.mkstemp(self.suffix, 372 self.prefix, 373 text=True) 374 os.write(fid,self.data) 375 os.close(fid) 376 return self.filename
377 - def __exit__(self, exc_type, exc_value, traceback):
378 import os 379 os.unlink(self.filename)
380 381 D2 = "# x y\n1 1\n2 2\n3 4\n2 5\n" 382 """x,y dataset for TempData""" 383 D3 = "# x y dy\n1 1 .1\n2 2 .2\n3 4 .4\n2 5 .5\n" 384 """x,y,dy dataset for TempData""" 385 D4 = "# x dx y dy\n1 .1 1 .1\n2 .2 2 .2\n3 .3 4 .4\n2 .3 5 .5\n" 386 """x,dx,y,dy dataset for TempData""" 387
388 -def test():
389 import os 390 import numpy 391 392 # Check that two column data loading works 393 with TempData(D2) as filename: 394 data = Data1D(filename=filename) 395 assert numpy.all(data.x == [1,2,2,3]) 396 assert numpy.all(data.y == [1,2,5,4]) 397 assert data.dx == 0 398 assert data.dy == 1 399 assert data.calc_x is None 400 assert data.residuals(lambda x: x)[3] == 1 401 402 # Check that interpolation works 403 data.calc_x = [1,1.5,3] 404 assert data.residuals(lambda x: x)[1] == 0 405 assert numpy.all(data.calc_y == data.calc_x) 406 # Note: calc_y is updated by data.residuals, so be careful with this test 407 408 # Check that three column data loading works 409 with TempData(D3) as filename: 410 data = Data1D(filename=filename) 411 assert numpy.all(data.x == [1,2,2,3]) 412 assert numpy.all(data.y == [1,2,5,4]) 413 assert numpy.all(data.dy == [.1,.2,.5,.4]) 414 assert data.dx == 0 415 assert data.calc_x is None 416 assert data.residuals(lambda x: x)[3] == 1/.4 417 418 # Check that four column data loading works 419 with TempData(D4) as filename: 420 data = Data1D(filename=filename) 421 assert numpy.all(data.x == [1,2,2,3]) 422 assert numpy.all(data.dx == [.1,.2,.3,.3]) 423 assert numpy.all(data.y == [1,2,5,4]) 424 assert numpy.all(data.dy == [.1,.2,.5,.4]) 425 426 # Test residuals 427 print "Fix the convolution function!" 428 print data.residuals(lambda x: x) 429 assert data.calc_x is None
430 431 if __name__ == "__main__": test() 432