hdf5pickle

author:Pauli Virtanen <pav@iki.fi>

Create easily interoperable representations of Python objects in HDF5 files. The aim of this module is to provide both

  1. convenient Python object persistence
  2. compatibility with non-Python applications

Point 2 is useful, for example, if results from numerical calculations should be easily transferable for example to a non-Python visualization program, such as Octave. Having a serialized object format that is directly readable saves some hassle in writing custom data dumping routines for each object.

Of course, if your data does not fit into memory, you still need to use full features of PyTables. But, you can still use hdf5pickle for other parts of the data.

This module implements dump and load methods analogous to those in Python’s pickle module. The programming interface corresponds to pickle protocol 2, although the data is not serialized but saved in HDF5 files. Additional methods, dump_many and load_many, are provided for loading multiple objects at once, to preserve references.

warning:Although this module passes all relevant pickle unit tests from Python2.4 plus additional tests, it is still in early stages of development.

Files

Example

In Python:

>>> import hdf5pickle, tables, datetime, Numeric as N
>>> f = tables.openFile('test.h5', 'w')
>>> data = {'a': 50, 'b': N.array([1,2,3,4,5]), 'c': datetime.datetime.now()}
>>> hdf5pickle.dump(data, f, '/data')
>>> f.close()

Meanwhile, in the shell:

$ h5ls -rd test.h5
/data                    Group
/data/a                  Dataset {SCALAR}
    Data:
        (0) 50
/data/b                  Dataset {5}
    Data:
        (0) 1, 2, 3, 4, 5
/data/c                  Group
/data/c/__               Group
/data/c/__/args          Group
/data/c/__/args/_0       Dataset {10}
    Data:
        (0) 7, 214, 8, 6, 20, 53, 55, 8, 190, 177
/data/c/__/func          Dataset {17}
    Data:
        (0) 100, 97, 116, 101, 116, 105, 109, 101, 10, 100, 97, 116, 101, 116,
        (14) 105, 109, 101

The dictionary, integer and numeric array were saved in quite a natural layout in the file. The datetime object obtained a “magical” “__” subgroup where things seem a bit more complicated. However, all data is available in a transparent format and not as an obscure Python pickle binary stream.

Then back in Python:

>>> f = tables.openFile('test.h5', 'r')
>>> data2 = hdf5pickle.load(f, '/data')
>>> data == data2
True
>>> f.close()

It seems to work. In fact, the source package contains quite a bit more tests, that can be ran with ./setup.py test.