Package park :: Package fitting :: Module fit

Source Code for Module park.fitting.fit

  1  # This program is public domain
 
  2  """
 
  3  Fitting service interface.
 
  4  
 
  5  A fit consists of a set of models and a fitting engine.  The models are
 
  6  collected in an assembly, which manages the parameter set and the
 
  7  constraints between them.  The models themselves are tightly coupled
 
  8  to the data that they are modeling and the data is invisible to the fit.
 
  9  
 
 10  The fitting engine can use a variety of methods depending on model.
 
 11  
 
 12  
 
 13  Usage
 
 14  =====
 
 15  
 
 16  The fitter can be run directly on the local machine::
 
 17  
 
 18      import park
 
 19      M1 = park.models.Peaks(datafile=park.sampledata('peak.dat'))
 
 20      M1.add_peak('P1', 'gaussian', A=[4,6], mu=[0.2, 0.5], sigma=0.1)
 
 21      result = park.fit(models=[M1])
 
 22      print result
 
 23  
 
 24  The default settings print results every time the fit improves, and
 
 25  print a global result when the fit is complete.  This is a suitable
 
 26  interface for a fitting script.
 
 27  
 
 28  For larger fit jobs you will want to run the fit on a remote server.
 
 29  The model setup is identical, but the fit call is different::
 
 30  
 
 31      service = park.FitService('server:port')
 
 32      result = park.fit(models=[M1], service=service)
 
 33      print result
 
 34  
 
 35  Again, the default settings print results every time the fit improves,
 
 36  and print a global result when the fit is complete.
 
 37  
 
 38  For long running fit jobs, you want to be able to disconnect from
 
 39  the server after submitting the job, and later reconnect to fetch
 
 40  the results.  An additional email field will send notification by
 
 41  email when the fit starts and ends, and daily updates on the status
 
 42  of all fits::
 
 43  
 
 44      service = park.FitService('server:port')
 
 45      service.notify(email='me@my.email.address',update='daily')
 
 46      fit = park.Fit(models=[M1])
 
 47      id = service.start(fit, jobname='peaks')
 
 48      print id
 
 49  
 
 50  The results can be retrieved either by id returned from the server,
 
 51  or by the given jobname::
 
 52  
 
 53      import park
 
 54      service = park.FitService('server:port',user='userid')
 
 55      fitlist = service.retrieve('peaks')
 
 56      for fit in fitlist:
 
 57          print fit.summary()
 
 58  
 
 59  The fit itself is a complicated object, including the model, the
 
 60  optimizer, and the type of uncertainty analysis to perform.
 
 61  
 
 62  GUI Usage
 
 63  =========
 
 64  
 
 65  When used from a graphical user interface, a different programming
 
 66  interface is needed.  In this case, the user may want to watch
 
 67  the progress of the fit and perhaps stop it.  Also, as fits can
 
 68  take some time to complete, the user would like to be able to
 
 69  set up additional fits and run them at the same time, switching
 
 70  between them as necessary to monitor progress::
 
 71  
 
 72      import park
 
 73  
 
 74      def start
 
 75      service = park.FitService('server:port',user='userid')
 
 76      fit = park.Fit(models=[M1])
 
 77      id = sevice.start(fit,
 
 78  
 
 79  """ 
 80  import time 
 81  
 
 82  from park.modelling import assembly 
 83  from park.optim import fitmc 
 84  from park.fitting import fitresult 
 85  
 
86 -class Objective(object):
87 """ 88 Abstract interface to the fitness function for the park minimizer 89 classes. 90 91 Park provides a specific implementation `park.assembly.Assembly`. 92 93 TODO: add a results() method to return model specific info to the 94 TODO: fit handler. 95 """
96 - def prepare(self):
97 """ 98 Do the necessary work to prepare to run the objective function, such 99 as loading large files and precalculating data structures. 100 101 This is called before before retrieving the fit parameters and running 102 the fit. 103 """
104
105 - def cleanup(self):
106 """ 107 Perform any cleanup required after all evaluations are complete. 108 """
109
110 - def fit_parameters(self):
111 """ 112 Returns a list of fit parameters. Each parameter has a name, 113 an initial value and a range. 114 115 See `park.fitresult.FitParameter` for an example. 116 117 On each function evaluation a new parameter set will be passed 118 to the fitter, with values in the same order as the list of 119 parameters. 120 """ 121 raise NotImplementedError
122
123 - def __call__(self, p):
124 """ 125 Returns the objective value when evaluated at point p in R^n. 126 127 The default is sumsq residuals. 128 """ 129 return numpy.sum(self.residuals(p)**2)
130
131 - def deriv(self, p):
132 """ 133 Returns the objective value and derivatives for parameter set p . 134 """ 135 raise NotImplementedError
136
137 - def jacobian(self, p, step=1e-8):
138 """ 139 Returns the derivative wrt the p at point p in R^n. 140 141 Numeric derivatives are calculated based on step, where step is 142 the portion of the total range for parameter j, or the portion of 143 point value p_j if the range on parameter j is infinite. 144 """ 145 # Make sure the input vector is an array 146 pvec = numpy.asarray(pvec) 147 # We are being lazy here. We can precompute the bounds, we can 148 # use the residuals_deriv from the sub-models which have analytic 149 # derivatives and we need only recompute the models which depend 150 # on the varying parameters. 151 # Meanwhile, let's compute the numeric derivative using the 152 # three point formula. 153 # We are not checking that the varied parameter in numeric 154 # differentiation is indeed feasible in the interval of interest. 155 range = zip(*[p.range for p in self._fitparameters]) 156 lo,hi = [numpy.asarray(v) for v in range] 157 delta = (hi-lo)*step 158 # For infinite ranges, use p*1e-8 for the step size 159 idx = numpy.isinf(delta) 160 #print "J",idx,delta,pvec,type(idx),type(delta),type(pvec) 161 delta[idx] = pvec[idx]*step 162 delta[delta==0] = step 163 164 # Set the initial value 165 for k,v in enumerate(pvec): 166 self._fitparameters[k].value = v 167 # Gather the residuals 168 r = [] 169 for k,v in enumerate(pvec): 170 # Center point formula: 171 # df/dv = lim_{h->0} ( f(v+h)-f(v-h) ) / ( 2h ) 172 h = delta[k] 173 self._fitparameters[k].value = v + h 174 self.eval() 175 rk = self.residuals 176 self._fitparameters[k].value = v - h 177 self.eval() 178 rk -= self.residuals 179 self._fitparameters[k].value = v 180 r.append(rk/(2*h)) 181 # return the jacobian 182 return numpy.vstack(r).T
183 184
185 - def cov(self, pvec):
186 """ 187 Return the covariance matrix at p cov(f,p) = inv(J'J) 188 where J is the Jacobian matrix [df(xi)/dpj] 189 190 Using the singular value decomposition we have:: 191 192 J = U S V' 193 J'J = (U S V')' (U S V') 194 = V S' U' U S V' 195 = V S S V' 196 inv(J'J) = inv(V S S V') 197 = inv(V') inv(S S) inv(V) 198 = V inv (S S) V' 199 200 Note that this function hangs unexpectedly for some versions 201 of numpy on Windows. Please make sure you are using the 202 latest version. 203 """ 204 # The singular value matrix S is diagonal, so inv(S**2) is 1/S**2 205 # using element-wise division. In fact, SVD returns S as a vector, 206 # so element-wise V/S**2 is equivalent to matrix V dot inv(S**2). 207 J = self.jacobian(pvec) 208 u,s,vh = numpy.linalg.svd(J,0) 209 JTJinv = numpy.dot(vh.T.conj()/s**2,vh) 210 return JTJinv
211
212 - def stderr(self, pvec):
213 """ 214 Return parameter uncertainty. 215 216 This is just the sqrt diagonal of covariance matrix inv(J'J) at point p. 217 """ 218 return numpy.sqrt(numpy.diag(self.cov(pvec)))
219 220
221 - def residuals(self, p):
222 """ 223 Some fitters, notably Levenberg-Marquardt, operate directly on the 224 residuals vector. If the individual residuals are not available, 225 then LM cannot be used. 226 227 This method is optional. 228 """ 229 raise NotImplementedError
230
231 - def residuals_deriv(self, p):
232 """ 233 Returns residuals and derivatives with respect to the given 234 parameters. 235 236 If these are unavailable in the model, then they can be approximated 237 by numerical derivatives, though it is generally better to use a 238 derivative free optimizer such as coliny or cobyla which can use the 239 function evaluations more efficiently. In any case, your objective 240 function is responsible for calculating these. 241 242 This method is optional. 243 """ 244 raise NotImplementedError
245
246 -class Fitter(object):
247 """ 248 Abstract interface for a fitness optimizer. 249 250 A fitter has a single method, fit, which takes an objective 251 function (`park.fit.Objective`) and a handler. 252 253 For a concrete instance see `park.fitmc.FitMC`. 254 """
255 - def prepare(self):
256 """ 257 Do the necessary work to prepare to fitter. 258 """
259
260 - def cleanup(self):
261 """ 262 Perform any cleanup required after the fitter is complete. 263 """
264
265 - def fit(self, objective, handler):
266 raise NotImplementedError
267
268 -class FitJob(object):
269 """ 270 Fit job. 271 272 This implements `park.job.Job`. 273 """
274 - def __init__(self, objective=None, fitter=None, handler=None):
275 self.fitter = fitter 276 self.objective = objective 277 self.handler = handler
278
279 - def prepare(self): pass
280 #self.fitter.prepare() 281 #self.objective.prepare() 282
283 - def cleanup(self): pass
284 #self.fitter.cleanup() 285 #self.objective.cleanup() 286
287 - def run(self):
288 self.fitter.fit(self.objective, self.handler) 289 return None
290
291 -class LocalQueue(object):
292 """ 293 Simple interface to the local job queue. Currently supports start and 294 wait. Needs to support stop and status. Also, needs to be a proper queue, 295 and needs to allow jobs to run in separate processes according to priority, 296 etc. All the essentials of the remote queuing system without the remote 297 calls. 298 299 Unlike the remote queue, the local queue need not maintain persistence. 300 """ 301 running = False
302 - def start(self, job):
303 self.job = job 304 job.handler.done = False 305 job.prepare() 306 job.run() # initiate job 307 # TODO: who triggers cleanup? 308 #job.cleanup() 309 return id(job)
310
311 - def wait(self, interval=.1):
312 """ 313 Wait for the job to complete. This is used in scripts to impose 314 a synchronous interface to the fitting service. 315 """ 316 while not self.job.handler.done: 317 time.sleep(interval) 318 return self.job.handler.result
319
320 -def fit(models=None, fitter=None, service=None, handler=None):
321 """ 322 Start a fit with a set of models. The model set must be 323 in a form accepted by `park.assembly.Assembly`. 324 325 This is a convenience function which sets up the default 326 optimizer and uses the local fitting engine to do the work. 327 Progress reports are printed as they are received. 328 329 The choice of fitter, service and handler can be specified 330 by the caller. 331 332 The default fitter is FitMC, which is a monte carlo Nelder-Mead 333 simplex local optimizer with 100 random start points. 334 335 The default handler does nothing. Instead, ConsoleUpdate could 336 be used to report progress during the fit. 337 338 The default service is to run in a separate thread with FitThread. 339 Note that this will change soon to run in a separate process on 340 the local machine so that python's global interpreter lock does 341 not interfere with parallelism. 342 """ 343 if models is None: raise RuntimeError('fit expected a list of models') 344 if service is None: service = LocalQueue() 345 if fitter is None: fitter = fitmc.FitMC() 346 if handler is None: handler = fitresult.FitHandler() 347 348 objective = assembly.Assembly(models) if isinstance(models,list) else models 349 job = FitJob(objective,fitter,handler) 350 service.start(job) 351 return service.wait()
352 353
354 -def assembly_example():
355 import park, time 356 problem = park.modelling.assembly.example() 357 #handler=fitresult.ConsoleUpdate(improvement_delta=0.1,progress_delta=1) 358 #result = fit(problem, handler=handler) 359 result = fit(problem) 360 print result 361 print "=== Fit complete ===" 362 result.print_summary() 363 print "=== Target values ===" 364 print "M1: a=1, c=1.5" 365 print "M2: a=2.5, c=3" 366 367 if False: # Detailed results 368 print "parameter vector",result.pvec 369 problem(result.pvec) 370 print "residuals",problem.residuals 371 for k,m in enumerate(problem.parts): 372 print "Model",k,"chisq",m.chisq,"weight",m.weight 373 print "pars",m.fitness.model.a,m.fitness.model.c 374 print "x",m.fitness.data.fit_x 375 print "y",m.fitness.data.fit_y 376 print "f(x)",m.fitness.data.fx 377 print "(y-f(x))/dy",m.residuals
378 379 if __name__ == "__main__": 380 #main() 381 assembly_example() 382