Package park :: Package util :: Module serial

Source Code for Module park.util.serial

  1  # This program is public domain 
  2  """ 
  3  Service to handle serialization and deserialization of python objects. 
  4   
  5  Object serialization is useful for long term storage, interlanguage 
  6  communication and network transmission.  In all cases, the process 
  7  involves an initial encode() followed by a later decode(). 
  8   
  9  The following properties are desirable for serialization/deserialization: 
 10   
 11  1. human readable so that disaster recovery is possible 
 12  2. readable/writable by other languages and environments 
 13  3. support for numerics: complex, nan, inf, arrays, full precision 
 14  4. version support: load object into newer versions of the 
 15     program even if the class structure has changed 
 16  5. refactoring support: load object into newer versions of the 
 17     program even if the classes have been moved or renamed 
 18  6. support self-referential data structures (not yet) 
 19   
 20  Python's builtin serialization, pickle/cPickle, cannot meet these 
 21  needs.  It is python specific, and not friendly to human readers 
 22  or readers from other environments such as IDL which may want to 
 23  load or receive data from a python program.  Pickle inf/nan doesn't 
 24  work on windows --- some of our models may use inf data, and some of 
 25  our results may be nan.  pickle has minimal support for versioning: 
 26  users can write __setstate__ which accepts a dictionary and adjusts 
 27  it accordingly.  Beware though that version must be an instance 
 28  variable rather than a class variable, since class variables are not 
 29  seen by pickle.  If the class is renamed, then pickle can do nothing 
 30  to recover it. 
 31   
 32  Instead of pickle, we break the problem into parts: structucture and 
 33  encoding.  A pair of functions `deconstruct` and `reconstruct` work 
 34  directly with the structure.  Deconstruct extracts the state of the 
 35  python object defined using a limited set of python primitives. 
 36  Reconstruct takes an extracted state and rebuilds the complete python 
 37  object.  See documentation on the individual functions for details. 
 38   
 39  Object persistence for long term storage places particular burdens 
 40  on the serialization protocol.  In particular, the class may have 
 41  changed since the instance was serialized.  To aid the process of 
 42  maintaining classes over the long term, the class definition can 
 43  contain the following magic names: 
 44   
 45  __version__ 
 46      Strict version number of the class.  See isnewer() for 
 47      details, or distutils.version.StrictVersion. 
 48  __factory__ 
 49      Name of a factory function to return an empty instance of 
 50      the class.  This will be stored as the class name, and 
 51      should include the complete path so that it can be 
 52      imported by python.  The class will be populated by the 
 53      normal restore mechanism for that class, using either 
 54      __reconstruct__, __setstate__ or setting the class dictionary. 
 55  __reconstruct__ 
 56      Method which takes a structure tree and rebuilds the object. 
 57      This is different from __setstate__ in that __setstate__ 
 58      assumes its children have already been reconstructed.  This 
 59      is the difference between top-down and bottom-up 
 60      interpretation.  Bottom-up is usually easiear and sufficient, 
 61      but top-down is required for radical restructuring of the 
 62      object representation. 
 63   
 64  We need to coexist with third party libraries which may or may 
 65  not use our object serialization technology, or even if they do, 
 66  we may want to replace the dependence of one third party library 
 67  with our own or another implementation.  In order to do so we 
 68  allow have class registry with the function `refactor` to add 
 69  entries to the registry.  Rather than blindly restoring the class 
 70  we first see if the class and version are in the registry, and 
 71  call the registered function to transform the data.  For example:: 
 72   
 73      import danse.util.serial as serial 
 74      serial.refactor('Numeric.array','numeric_converter','0.0') 
 75   
 76  This says that items stored as the class Numeric.array will be 
 77  loaded by calling numeric_converter, which returns a 
 78   
 79  Example 
 80  ======= 
 81   
 82  The following example shows how to use reconstruct and factory to get 
 83  maximum flexibility when restoring an object. 
 84   
 85  mylib.__init__.py:: 
 86   
 87      def data(): 
 88          from mylib.core.data import Data 
 89          return Data() 
 90   
 91  mylib.core.data.py:: 
 92   
 93      from danse.util.serial import isnewer, reconstruct, setstate 
 94      class Data(object): 
 95          __version__ = '1.2' 
 96          __factory__ = 'mylib.data' 
 97          def __reconstruct__(self,instance): 
 98              ''' 
 99              Reconstruct the state from 
100              ''' 
101              if isnewer('1.0',instance['version']): 
102                  raise RuntimeError('pre-1.0 data objects no longer supported') 
103              if isnewer('1.1',instance['version']): 
104                  # Version 1.1 added uncertainty; default it to zero 
105                  instance['state']['uncertainty'] = 0 
106              setstate(self,reconstruct(instance['state'])) 
107   
108   
109  TODO: reconstruct needs to raise specialized error which is a 
110  subclass of Runtime error that indicates the name of the package 
111  that is out of date so that the application can trigger its 
112  package manager. 
113   
114  TODO: what if class registry depends on which package is asking 
115  for the redirect?  Do we want to allow a chained registry, where 
116  we can override the application default?  Probably not, but the 
117  full answer depends on the solution to the Deployment Problem. 
118   
119  TODO: cannot handle self-referential data structures 
120  """ 
121   
122  import types 
123  import sys 
124 125 -def deconstruct(obj):
126 """ 127 Convert an object hierarchy into python primitives. 128 129 The primitives used are int, float, str, unicode, bool, None, 130 list, tuple, and dict. 131 132 Classes are encoded as a dict with keys '.class', '.version', and '.state'. 133 Version is copied from the attribute __version__ if it exists. 134 135 Functions are encoded as a dict with key '.function'. 136 137 Raises RuntimeError if object cannot be deconstructed. For example, 138 deconstruct on deconstruct will cause problems since '.class' will 139 be in the dictionary of a deconstructed object. 140 """ 141 if type(obj) in [int, float, str, unicode, bool] or obj is None: 142 return obj 143 elif type(obj) in [list, tuple, set]: 144 return type(obj)(deconstruct(el) for el in obj) 145 elif type(obj) == dict: 146 # Check for errors 147 for name in ['.class', '.function']: 148 if name in obj: 149 raise RuntimeError("Cannot deconstruct dict containing "+name) 150 return dict((k,deconstruct(v)) for k,v in obj.items()) 151 elif type(obj) == types.FunctionType: 152 return { 153 '.function' : obj.__module__+'.'+obj.__name__ 154 } 155 else: 156 cls = _getclass(obj) 157 version = _getversion(obj) 158 return { 159 '.class' : _getclass(obj), 160 '.version' : _getversion(obj), 161 '.state' : deconstruct(_getstate(obj)) 162 }
163
164 -def reconstruct(tree):
165 """ 166 Reconstruct an object hierarchy from a tree of primitives. 167 168 The tree is generated by deconstruct from python primitives 169 (list, dict, string, number, boolean, None) with classes 170 encoded as a particular kind of dict. 171 172 Unlike pickle, we do not make an exact copy of the original 173 object. In particular, the serialization format may not 174 distinguish between list and tuples, or str and unicode. We 175 also have no support for self-referential structures. 176 177 Raises RuntimeError if could not reconstruct 178 """ 179 if type(tree) in [int, float, str, unicode, bool] or tree is None: 180 return tree 181 elif type(tree) in [list, tuple, set]: 182 return type(tree)(reconstruct(el) for el in tree) 183 elif type(tree) == dict: 184 if '.class' in tree: 185 # Chain if program version is newer than stored version (too cold) 186 fn = _lookup_refactor(tree['.class'],tree['.version']) 187 if fn is not None: return fn(tree) 188 189 # Fail if program version is older than stored version (too hot) 190 obj = _createobj(tree['.class']) 191 if isnewer(tree['.version'],_getversion(obj)): 192 raise RuntimeError('Version of %s is out of date'%tree['.class']) 193 # Reconstruct if program version matches stored version (just right) 194 if hasattr(obj, '__reconstruct__'): 195 obj.__reconstruct__(tree['.state']) 196 else: 197 _setstate(obj,reconstruct(tree['.state'])) 198 return obj 199 elif '.function' in tree: 200 return _import_symbol(tree['.function']) 201 else: 202 return dict((k,reconstruct(v)) for k,v in tree.items()) 203 else: 204 raise RuntimeError('Could not reconstruct '+type(obj).__name__)
205
206 -def _getversion(obj):
207 version = getattr(obj,'__version__','0.0') 208 try: 209 # Force parsing of version number to check format 210 isnewer(version,'0.0') 211 except ValueError,msg: 212 raise ValueError("%s for class %s"%(msg,obj.__class__.__name__)) 213 return version
214
215 -def _getclass(obj):
216 if hasattr(obj,'__factory__'): return obj.__factory__ 217 return obj.__class__.__module__+'.'+obj.__class__.__name__
218
219 -def _getstate(obj):
220 if hasattr(obj,'__getinitargs__') or hasattr(obj,'__getnewargs__'): 221 # Laziness: we could fetch the initargs and store them, but until 222 # we need to do so, I'm not going to add the complexity. 223 raise RuntimeError('Cannot serialize a class with initialization arguments') 224 elif hasattr(obj,'__getstate__'): 225 state = obj.__getstate__() 226 elif hasattr(obj,'__slots__'): 227 state = dict((s,getattr(obj,s)) for s in obj.__slots__ if hasattr(obj,s)) 228 elif hasattr(obj,'__dict__'): 229 state = obj.__dict__ 230 else: 231 state = {} 232 return state
233
234 -def _setstate(obj,kw):
235 if hasattr(obj,'__setstate__'): 236 obj.__setstate__(kw) 237 elif hasattr(obj,'__slots__'): 238 for k,v in kw.items(): setattr(obj,k,v) 239 elif hasattr(obj,'__dict__'): 240 obj.__dict__ = kw 241 else: 242 pass 243 return obj
244
245 -def _lookup_refactor(cls,ver):
246 return None
247
248 -class _EmptyClass: pass
249 -def _import_symbol(path):
250 """ 251 Recover symbol from path. 252 """ 253 parts = path.split('.') 254 module_name = ".".join(parts[:-1]) 255 symbol_name = parts[-1] 256 __import__(module_name) 257 module = sys.modules[module_name] 258 symbol = getattr(module,symbol_name) 259 return symbol
260
261 -def _createobj(path):
262 """ 263 Create an empty object which we can update with __setstate__ 264 """ 265 factory = _import_symbol(path) 266 if type(factory) is types.FunctionType: 267 # Factory method to return an empty class instance 268 obj = factory() 269 elif type(factory) is types.ClassType: 270 # Old-style class: create an empty class and override its __class__ 271 obj = _EmptyClass() 272 obj.__class__ = factory 273 elif type(factory) is types.TypeType: 274 # elif issubclass(factory, types.TypeType): 275 obj = factory.__new__(factory) 276 else: 277 raise RuntimeError('%s should be a function, class or type'%path) 278 return obj
279
280 -def isnewer(version,target):
281 """ 282 Version comparison function. Returns true if version is at least 283 as new as the target version. 284 285 A version number consists of two or three dot-separated numeric 286 components, with an optional "pre-release" tag on the end. The 287 pre-release tag consists of the letter 'a' or 'b' followed by 288 a number. If the numeric components of two version numbers 289 are equal, then one with a pre-release tag will always 290 be deemed earlier (lesser) than one without. 291 292 The following will be true for version numbers:: 293 294 8.2 < 8.19a1 < 8.19 == 8.19.0 295 296 297 You should follow the rule of incrementing the minor version number 298 if you add attributes to your models, and the major version number 299 if you remove attributes. Then assuming you are working with 300 e.g., version 2.2, your model loading code will look like:: 301 302 if isnewer(version, Model.__version__): 303 raise IOError('software is older than model') 304 elif isnewer(xml.version, '2.0'): 305 instantiate current model from xml 306 elif isnewer(xml.version, '1.0'): 307 instantiate old model from xml 308 copy old model format to new model format 309 else: 310 raise IOError('pre-1.0 models not supported') 311 312 Based on distutils.version.StrictVersion 313 """ 314 from distutils.version import StrictVersion as Version 315 return Version(version) > Version(target)
316
317 -class _RefactoringRegistry(object):
318 """ 319 Directory of renamed classes. 320 321 """ 322 registry = {} 323 324 @classmethod
325 - def register(cls,oldname,newname,asof_version):
326 """ 327 As of the target version, references to the old name are no 328 longer valid (e.g., when reconstructing stored objects), and 329 should be resolved by the new name (or None if they should 330 just raise an error.) The old name can then be reused for 331 new objects or abandoned. 332 """ 333 # Insert (asof_version,newname) in the right place in the 334 # list of rename targets for the object. This list will 335 # be empty unless the name is reused. 336 if name not in cls.registry: cls.registry[name] = [] 337 for idx,(version,name) in cls.registry[name]: 338 if isnewer(asof_version, version): 339 cls.registry[name].insert(idx,(asof_version, newname)) 340 break 341 else: 342 cls.registry[name].append((asof_version, newname))
343 344 @classmethod
345 - def redirect(cls, oldname, newname, version):
346 if oldname not in cls.registry[oldname]: return None 347 for idx,(target_version,newname) in cls.registry[name]: 348 if isnewer(target_version, version): 349 return target_version
350 # error conditions at this point
351 352 -def refactor(oldname,newname,asof_version):
353 """ 354 Register the renaming of a class. 355 356 As code is developed and maintained over time, it is sometimes 357 beneficial to restructure the source to support new features. 358 However, the structure and location of particular objects is 359 encoded in the saved file format. 360 361 When you move a class that may be stored in a model, 362 be sure to put an entry into the registry saying where 363 the model was moved, or None if the model is no longer 364 supported. 365 366 reconstructor as a function to build a python object from 367 a particular class/version, presumably older than the current 368 version. This is necessary, e.g., to set default values for new 369 fields or to modify components of the model which are now 370 represented differently. 371 372 The reconstructor function takes the structure above as 373 its argument and returns a python instance. You are free 374 to restructure the state and version fields as needed to 375 bring the object in line with the next version, then call 376 setstate(tree) to build the return object. Indeed this 377 technique will chain, and you can morph an ancient version 378 of your models into the latest version. 379 """ 380 381 return _RefactoringRegistry.redirect(oldname, newname, asof_version)
382
383 # === Test classes need to be at the top level for reconstruct to find them === 384 -class _Simple: x = 5
385 -class _SimpleNew(object): x = 5
386 -class _Slotted(object): __slots__ = ['a','b']
387 -class _Controlled:
388 - def __getstate__(self): return ["mystate",self.__dict__]
389 - def __setstate__(self, state):
390 if state[0] != "mystate": raise RuntimeError("didn't get back my state") 391 self.__dict__ = state[1]
392 -class _Factory:
393 __factory__ = __name__+"._factory"
394 -def _factory():
395 obj = _Factory() 396 # Note: can't modify obj because state will be overridden 397 _Factory.fromfactory = True 398 return obj
399 -class _VersionError:
400 __version__ = "3.5."
401 -def _hello():
402 return 'hello'
403 -def _exercise_types(encode=None,decode=None):
404 primitives = ['list',1,{'of':'dict',2:'really'},True,None] 405 assert decode(encode(primitives)) == primitives 406 407 # Hmmm... dicts with non-string keys are not permitted by strict json 408 # I'm not sure we care for our purposes, but it would be best to avoid 409 # them and instead have a list of tuples which can be converted to and 410 # from a dict if the need arises 411 #assert encode(primitives) == '["list",1,{"of":"dict",2:"really"},true,null]' 412 413 h = _Simple() 414 h.a = 2 415 #print encode(deconstruct(h)) 416 assert decode(encode(h)).a == h.a 417 418 assert decode(encode(_hello))() == 'hello' 419 420 h = _SimpleNew() 421 h.a = 2 422 #print encode(deconstruct(h)) 423 assert decode(encode(h)).a == h.a 424 425 h = _Slotted() 426 h.a = 2 427 #print encode(deconstruct(h)) 428 assert decode(encode(h)).a == h.a 429 430 h = _Controlled() 431 h.a = 2 432 #print encode(deconstruct(h)) 433 assert decode(encode(h)).a == h.a 434 435 h = _Factory() 436 h.a = 2 437 #print encode(deconstruct(h)) 438 assert decode(encode(h)).a == h.a 439 assert hasattr(h,'fromfactory') 440 441 try: 442 encode(_VersionError()) 443 raise RuntimeError("should have raised a version error") 444 except ValueError,msg: 445 assert "_VersionError" in str(msg)
446
447 -def test():
448 _exercise_types(encode=deconstruct,decode=reconstruct)
449 450 if __name__ == "__main__": 451 test() 452