1
2 """
3 Service to handle serialization and deserialization of python objects.
4
5 Object serialization is useful for long term storage, interlanguage
6 communication and network transmission. In all cases, the process
7 involves an initial encode() followed by a later decode().
8
9 The following properties are desirable for serialization/deserialization:
10
11 1. human readable so that disaster recovery is possible
12 2. readable/writable by other languages and environments
13 3. support for numerics: complex, nan, inf, arrays, full precision
14 4. version support: load object into newer versions of the
15 program even if the class structure has changed
16 5. refactoring support: load object into newer versions of the
17 program even if the classes have been moved or renamed
18 6. support self-referential data structures (not yet)
19
20 Python's builtin serialization, pickle/cPickle, cannot meet these
21 needs. It is python specific, and not friendly to human readers
22 or readers from other environments such as IDL which may want to
23 load or receive data from a python program. Pickle inf/nan doesn't
24 work on windows --- some of our models may use inf data, and some of
25 our results may be nan. pickle has minimal support for versioning:
26 users can write __setstate__ which accepts a dictionary and adjusts
27 it accordingly. Beware though that version must be an instance
28 variable rather than a class variable, since class variables are not
29 seen by pickle. If the class is renamed, then pickle can do nothing
30 to recover it.
31
32 Instead of pickle, we break the problem into parts: structucture and
33 encoding. A pair of functions `deconstruct` and `reconstruct` work
34 directly with the structure. Deconstruct extracts the state of the
35 python object defined using a limited set of python primitives.
36 Reconstruct takes an extracted state and rebuilds the complete python
37 object. See documentation on the individual functions for details.
38
39 Object persistence for long term storage places particular burdens
40 on the serialization protocol. In particular, the class may have
41 changed since the instance was serialized. To aid the process of
42 maintaining classes over the long term, the class definition can
43 contain the following magic names:
44
45 __version__
46 Strict version number of the class. See isnewer() for
47 details, or distutils.version.StrictVersion.
48 __factory__
49 Name of a factory function to return an empty instance of
50 the class. This will be stored as the class name, and
51 should include the complete path so that it can be
52 imported by python. The class will be populated by the
53 normal restore mechanism for that class, using either
54 __reconstruct__, __setstate__ or setting the class dictionary.
55 __reconstruct__
56 Method which takes a structure tree and rebuilds the object.
57 This is different from __setstate__ in that __setstate__
58 assumes its children have already been reconstructed. This
59 is the difference between top-down and bottom-up
60 interpretation. Bottom-up is usually easiear and sufficient,
61 but top-down is required for radical restructuring of the
62 object representation.
63
64 We need to coexist with third party libraries which may or may
65 not use our object serialization technology, or even if they do,
66 we may want to replace the dependence of one third party library
67 with our own or another implementation. In order to do so we
68 allow have class registry with the function `refactor` to add
69 entries to the registry. Rather than blindly restoring the class
70 we first see if the class and version are in the registry, and
71 call the registered function to transform the data. For example::
72
73 import danse.util.serial as serial
74 serial.refactor('Numeric.array','numeric_converter','0.0')
75
76 This says that items stored as the class Numeric.array will be
77 loaded by calling numeric_converter, which returns a
78
79 Example
80 =======
81
82 The following example shows how to use reconstruct and factory to get
83 maximum flexibility when restoring an object.
84
85 mylib.__init__.py::
86
87 def data():
88 from mylib.core.data import Data
89 return Data()
90
91 mylib.core.data.py::
92
93 from danse.util.serial import isnewer, reconstruct, setstate
94 class Data(object):
95 __version__ = '1.2'
96 __factory__ = 'mylib.data'
97 def __reconstruct__(self,instance):
98 '''
99 Reconstruct the state from
100 '''
101 if isnewer('1.0',instance['version']):
102 raise RuntimeError('pre-1.0 data objects no longer supported')
103 if isnewer('1.1',instance['version']):
104 # Version 1.1 added uncertainty; default it to zero
105 instance['state']['uncertainty'] = 0
106 setstate(self,reconstruct(instance['state']))
107
108
109 TODO: reconstruct needs to raise specialized error which is a
110 subclass of Runtime error that indicates the name of the package
111 that is out of date so that the application can trigger its
112 package manager.
113
114 TODO: what if class registry depends on which package is asking
115 for the redirect? Do we want to allow a chained registry, where
116 we can override the application default? Probably not, but the
117 full answer depends on the solution to the Deployment Problem.
118
119 TODO: cannot handle self-referential data structures
120 """
121
122 import types
123 import sys
126 """
127 Convert an object hierarchy into python primitives.
128
129 The primitives used are int, float, str, unicode, bool, None,
130 list, tuple, and dict.
131
132 Classes are encoded as a dict with keys '.class', '.version', and '.state'.
133 Version is copied from the attribute __version__ if it exists.
134
135 Functions are encoded as a dict with key '.function'.
136
137 Raises RuntimeError if object cannot be deconstructed. For example,
138 deconstruct on deconstruct will cause problems since '.class' will
139 be in the dictionary of a deconstructed object.
140 """
141 if type(obj) in [int, float, str, unicode, bool] or obj is None:
142 return obj
143 elif type(obj) in [list, tuple, set]:
144 return type(obj)(deconstruct(el) for el in obj)
145 elif type(obj) == dict:
146
147 for name in ['.class', '.function']:
148 if name in obj:
149 raise RuntimeError("Cannot deconstruct dict containing "+name)
150 return dict((k,deconstruct(v)) for k,v in obj.items())
151 elif type(obj) == types.FunctionType:
152 return {
153 '.function' : obj.__module__+'.'+obj.__name__
154 }
155 else:
156 cls = _getclass(obj)
157 version = _getversion(obj)
158 return {
159 '.class' : _getclass(obj),
160 '.version' : _getversion(obj),
161 '.state' : deconstruct(_getstate(obj))
162 }
163
165 """
166 Reconstruct an object hierarchy from a tree of primitives.
167
168 The tree is generated by deconstruct from python primitives
169 (list, dict, string, number, boolean, None) with classes
170 encoded as a particular kind of dict.
171
172 Unlike pickle, we do not make an exact copy of the original
173 object. In particular, the serialization format may not
174 distinguish between list and tuples, or str and unicode. We
175 also have no support for self-referential structures.
176
177 Raises RuntimeError if could not reconstruct
178 """
179 if type(tree) in [int, float, str, unicode, bool] or tree is None:
180 return tree
181 elif type(tree) in [list, tuple, set]:
182 return type(tree)(reconstruct(el) for el in tree)
183 elif type(tree) == dict:
184 if '.class' in tree:
185
186 fn = _lookup_refactor(tree['.class'],tree['.version'])
187 if fn is not None: return fn(tree)
188
189
190 obj = _createobj(tree['.class'])
191 if isnewer(tree['.version'],_getversion(obj)):
192 raise RuntimeError('Version of %s is out of date'%tree['.class'])
193
194 if hasattr(obj, '__reconstruct__'):
195 obj.__reconstruct__(tree['.state'])
196 else:
197 _setstate(obj,reconstruct(tree['.state']))
198 return obj
199 elif '.function' in tree:
200 return _import_symbol(tree['.function'])
201 else:
202 return dict((k,reconstruct(v)) for k,v in tree.items())
203 else:
204 raise RuntimeError('Could not reconstruct '+type(obj).__name__)
205
207 version = getattr(obj,'__version__','0.0')
208 try:
209
210 isnewer(version,'0.0')
211 except ValueError,msg:
212 raise ValueError("%s for class %s"%(msg,obj.__class__.__name__))
213 return version
214
216 if hasattr(obj,'__factory__'): return obj.__factory__
217 return obj.__class__.__module__+'.'+obj.__class__.__name__
218
220 if hasattr(obj,'__getinitargs__') or hasattr(obj,'__getnewargs__'):
221
222
223 raise RuntimeError('Cannot serialize a class with initialization arguments')
224 elif hasattr(obj,'__getstate__'):
225 state = obj.__getstate__()
226 elif hasattr(obj,'__slots__'):
227 state = dict((s,getattr(obj,s)) for s in obj.__slots__ if hasattr(obj,s))
228 elif hasattr(obj,'__dict__'):
229 state = obj.__dict__
230 else:
231 state = {}
232 return state
233
235 if hasattr(obj,'__setstate__'):
236 obj.__setstate__(kw)
237 elif hasattr(obj,'__slots__'):
238 for k,v in kw.items(): setattr(obj,k,v)
239 elif hasattr(obj,'__dict__'):
240 obj.__dict__ = kw
241 else:
242 pass
243 return obj
244
247
250 """
251 Recover symbol from path.
252 """
253 parts = path.split('.')
254 module_name = ".".join(parts[:-1])
255 symbol_name = parts[-1]
256 __import__(module_name)
257 module = sys.modules[module_name]
258 symbol = getattr(module,symbol_name)
259 return symbol
260
262 """
263 Create an empty object which we can update with __setstate__
264 """
265 factory = _import_symbol(path)
266 if type(factory) is types.FunctionType:
267
268 obj = factory()
269 elif type(factory) is types.ClassType:
270
271 obj = _EmptyClass()
272 obj.__class__ = factory
273 elif type(factory) is types.TypeType:
274
275 obj = factory.__new__(factory)
276 else:
277 raise RuntimeError('%s should be a function, class or type'%path)
278 return obj
279
281 """
282 Version comparison function. Returns true if version is at least
283 as new as the target version.
284
285 A version number consists of two or three dot-separated numeric
286 components, with an optional "pre-release" tag on the end. The
287 pre-release tag consists of the letter 'a' or 'b' followed by
288 a number. If the numeric components of two version numbers
289 are equal, then one with a pre-release tag will always
290 be deemed earlier (lesser) than one without.
291
292 The following will be true for version numbers::
293
294 8.2 < 8.19a1 < 8.19 == 8.19.0
295
296
297 You should follow the rule of incrementing the minor version number
298 if you add attributes to your models, and the major version number
299 if you remove attributes. Then assuming you are working with
300 e.g., version 2.2, your model loading code will look like::
301
302 if isnewer(version, Model.__version__):
303 raise IOError('software is older than model')
304 elif isnewer(xml.version, '2.0'):
305 instantiate current model from xml
306 elif isnewer(xml.version, '1.0'):
307 instantiate old model from xml
308 copy old model format to new model format
309 else:
310 raise IOError('pre-1.0 models not supported')
311
312 Based on distutils.version.StrictVersion
313 """
314 from distutils.version import StrictVersion as Version
315 return Version(version) > Version(target)
316
318 """
319 Directory of renamed classes.
320
321 """
322 registry = {}
323
324 @classmethod
325 - def register(cls,oldname,newname,asof_version):
326 """
327 As of the target version, references to the old name are no
328 longer valid (e.g., when reconstructing stored objects), and
329 should be resolved by the new name (or None if they should
330 just raise an error.) The old name can then be reused for
331 new objects or abandoned.
332 """
333
334
335
336 if name not in cls.registry: cls.registry[name] = []
337 for idx,(version,name) in cls.registry[name]:
338 if isnewer(asof_version, version):
339 cls.registry[name].insert(idx,(asof_version, newname))
340 break
341 else:
342 cls.registry[name].append((asof_version, newname))
343
344 @classmethod
345 - def redirect(cls, oldname, newname, version):
346 if oldname not in cls.registry[oldname]: return None
347 for idx,(target_version,newname) in cls.registry[name]:
348 if isnewer(target_version, version):
349 return target_version
350
351
352 -def refactor(oldname,newname,asof_version):
353 """
354 Register the renaming of a class.
355
356 As code is developed and maintained over time, it is sometimes
357 beneficial to restructure the source to support new features.
358 However, the structure and location of particular objects is
359 encoded in the saved file format.
360
361 When you move a class that may be stored in a model,
362 be sure to put an entry into the registry saying where
363 the model was moved, or None if the model is no longer
364 supported.
365
366 reconstructor as a function to build a python object from
367 a particular class/version, presumably older than the current
368 version. This is necessary, e.g., to set default values for new
369 fields or to modify components of the model which are now
370 represented differently.
371
372 The reconstructor function takes the structure above as
373 its argument and returns a python instance. You are free
374 to restructure the state and version fields as needed to
375 bring the object in line with the next version, then call
376 setstate(tree) to build the return object. Indeed this
377 technique will chain, and you can morph an ancient version
378 of your models into the latest version.
379 """
380
381 return _RefactoringRegistry.redirect(oldname, newname, asof_version)
382
386 -class _Slotted(object): __slots__ = ['a','b']
390 if state[0] != "mystate": raise RuntimeError("didn't get back my state")
391 self.__dict__ = state[1]
393 __factory__ = __name__+"._factory"
395 obj = _Factory()
396
397 _Factory.fromfactory = True
398 return obj
404 primitives = ['list',1,{'of':'dict',2:'really'},True,None]
405 assert decode(encode(primitives)) == primitives
406
407
408
409
410
411
412
413 h = _Simple()
414 h.a = 2
415
416 assert decode(encode(h)).a == h.a
417
418 assert decode(encode(_hello))() == 'hello'
419
420 h = _SimpleNew()
421 h.a = 2
422
423 assert decode(encode(h)).a == h.a
424
425 h = _Slotted()
426 h.a = 2
427
428 assert decode(encode(h)).a == h.a
429
430 h = _Controlled()
431 h.a = 2
432
433 assert decode(encode(h)).a == h.a
434
435 h = _Factory()
436 h.a = 2
437
438 assert decode(encode(h)).a == h.a
439 assert hasattr(h,'fromfactory')
440
441 try:
442 encode(_VersionError())
443 raise RuntimeError("should have raised a version error")
444 except ValueError,msg:
445 assert "_VersionError" in str(msg)
446
449
450 if __name__ == "__main__":
451 test()
452