Package park :: Package modelling :: Module data :: Class Data1D

Class Data1D

source code


Data representation for 1-D fitting.

Attributes

filename
The source of the data. This may be the empty string if the data is simulation data.
x,y,dy
The data values. x is the measurement points of data to be fitted. x must be sorted. y is the measured value dy is the measurement uncertainty.
dx
Resolution at the the measured points. The resolution may be 0, constant, or defined for each data point. dx is the 1-sigma width of the Gaussian resolution function at point x. Note that dx_FWHM = sqrt(8 ln 2) dx_sigma, so scale dx appropriately.
fit_x,fit_dx,fit_y,fit_dy
The points used in evaluating the residuals.
calc_x
The points at which to evaluate the theory function. This may be different from the measured points for a number of reasons, such as a resolution function which suggests over or under sampling of the points (see below). By default calc_x is x, but it can be set explicitly by the user.
calc_y, fx
The value of the function at the theory points, and the value of the function after resolution has been applied. These values are computed by a call to residuals.

Notes on calc_x

The contribution of Q to a resolution of width dQo at point Qo is:

p(Q) = 1/sqrt(2 pi dQo**2) exp ( (Q-Qo)**2/(2 dQo**2) )

We are approximating the convolution at Qo using a numerical approximation to the integral over the measured points, with the integral is limited to p(Q_i)/p(0)>=0.001.

Sometimes the function we are convoluting is rapidly changing. That means the correct convolution should uniformly sample across the entire width of the Gaussian. This is not possible at the end points unless you calculate the theory function beyond what is strictly needed for the data. For a given dQ and step size, you need enough steps that the following is true:

(n*step)**2 > -2 dQ**2 * ln 0.001

The choice of sampling density is particularly important near critical points where the shape of the function changes. In reflectometry, the function goes from flat below the critical edge to O(Q**4) above. In one particular model, calculating every 0.005 rather than every 0.02 changed a value above the critical edge by 15%. In a fitting program, this would lead to a somewhat larger estimate of the critical edge for this sample.

Sometimes the theory function is oscillating more rapidly than the instrument can resolve. This happens for example in reflectivity measurements involving thick layers. In these systems, the theory function should be oversampled around the measured points Q. With a single thick layer, oversampling can be limited to just one period 2 pi/d. With multiple thick layers, oscillations will show interference patterns and it will be necessary to oversample uniformly through the entire width of the resolution. If this is not accommodated, then aliasing effects make it difficult to compute the correct model.

Instance Methods
 
__init__(self, filename='', x=None, y=None, dx=0, dy=1)
Define the fitting data.
source code
 
resample(self, minstep=None)
Over/under sampling support.
source code
 
load(self, filename, **kw)
Load a multicolumn datafile.
source code
 
select(self, idx)
A selection vector for points to use in the evaluation of the residuals, or None if all points are to be used.
source code
 
residuals(self, fn)
Compute the residuals of the data wrt to the given function.
source code
 
residuals_deriv(self, fn, pars=[])
Compute residuals and derivatives wrt the given parameters.
source code

Inherited from object: __delattr__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __repr__, __setattr__, __str__

Class Variables
  filename = ''
  fx = None
  calc_x = None
  calc_y = None
  dx = 0
  dy = 1
  fit_dx = 0
  fit_dy = 1
  fit_x = None
  fit_y = None
  x = None
  y = None
Properties

Inherited from object: __class__

Method Details

__init__(self, filename='', x=None, y=None, dx=0, dy=1)
(Constructor)

source code 

Define the fitting data.

Data can be loaded from a file using filename or specified directly using x,y,dx,dy. File loading happens after assignment of x,y,dx,dy.

Overrides: object.__init__

resample(self, minstep=None)

source code 

Over/under sampling support.

Compute the calc_x points required to adequately sample the function y=f(x) so that the value reported for each measured point is supported by the resolution dx at x. minstep is the minimum allowed sampling density that should be used.

load(self, filename, **kw)

source code 

Load a multicolumn datafile.

Data should be in columns, with the following defaults:

x,y or x,y,dy or x,dx,y,dy

Note that this resets the selected fitting points calc_x and the computed results calc_y and fx.

Data is sorted after loading.

Any extra keyword arguments are passed to the numpy loadtxt function. This allows you to select the columns you want, skip rows, set the column separator, change the comment character, amongst other things.

residuals(self, fn)

source code 

Compute the residuals of the data wrt to the given function.

Returns R = (y - f(x;p)) / sigma_y

y = fn(x) should be a callable accepting a list of points at which to calculate the function, returning a vector of values at those points.

Any resolution function will be applied after the theory points are calculated. To suppress the resolution calculation, set fit_dx to 0.

residuals_deriv(self, fn, pars=[])

source code 

Compute residuals and derivatives wrt the given parameters.

fdf = fn(x,pars=pars) should be a callable accepting a list of points at which to calculate the function and a keyword argument listing the parameters for which the derivative will be calculated.

Returns a list of the residuals and the derivative wrt the parameter for each parameter:

R = (y-f(x;p)) / sigma_y
dR/dp1 = -1/sigma_y df(x;p)/dp1
dR/dp2 = -1/sigma_y df(x;p)/dp2
...

The fitness function is sum(R**2) and its derivative wrt pi is 2*sum(R*df/dpi).

Any resolution function will be applied after the theory points and derivatives are calculated. To suppress the resolution calculation, set fit_dx to 0.

Note that we can apply the resolution to the analytic derivatives rather than computing them numerically. The resolution calculation is a convolution of a function f(x) with a distribution of possible inputs G(x). For reasonable function G,f the derivative of the convolution is the convolution of the derivative. This can be seen from the following equations:

d/dp G(x) * f(x;p) = d/dp int G(z-x) f(z;p) dz
= int d/dp G(z-x) f(z;p) dz = int G(z-x) df(z;p)/dp dz = G(x) * df(x;p)/dp

Keep in mind that the convolution is with respect to x and the derivative is with respect to p.