IO (deepdish.io)

See the Saving and loading data chapter for a tutorial.

deepdish.io.save(path, data, compression='default')[source]

Save any Python structure to an HDF5 file. It is particularly suited for Numpy arrays. This function works similar to numpy.save, except if you save a Python object at the top level, you do not need to issue data.flat[1] to retrieve it from inside a Numpy array of type object.

Five types of objects get saved natively in HDF5, the rest get serialized automatically. For most needs, you should be able to stick to the five, which are:

  • Dictionaries
  • Lists and tuples
  • Basic data types (including strings and None)
  • Numpy arrays
  • SimpleNamespaces (for Python >= 3.3, but see note below)

A recommendation is to always convert your data to using only these five ingredients. That way your data will always be retrievable by any HDF5 reader. A class that helps you with this is deepdish.util.Saveable.

Note that the SimpleNamespace type will be read in as dictionaries for earlier versions of Python.

This function requires the PyTables module to be installed.

You can change the default compression method by create a ~/.deepdish.conf, that could look like:

` [io] compression: blosc `

This is the recommended compression method if you plan to use your HDF5 files exclusively through deepdish (or PyTables).

Parameters:
  • path (string) – Filename to which the data is saved.
  • data (anything) – Data to be saved. This can be anything from a Numpy array, a string, an object, or a dictionary containing all of them including more dictionaries.
  • compression (string or tuple) – Set compression method, choosing from blosc, zlib, lzo, bzip2 and more (see PyTables documentation). It can also be specified as a tuple (e.g. ('blosc', 5)), with the latter value specifying the level of compression, choosing from 0 (no compression) to 9 (maximum compression). Set to None to turn off compression. The default is zlib, since it is highly portable; for much greater speed, try for instance blosc.

See also

load()

deepdish.io.load(path, group=None, sel=None, unpack=False)[source]

Loads an HDF5 saved with save.

This function requires the PyTables module to be installed.

Parameters:
  • path (string) – Filename from which to load the data.
  • group (string or list) – Load a specific group in the HDF5 hierarchy. If group is a list of strings, then a tuple will be returned with all the groups that were specified.
  • sel (slice or tuple of slices) – If you specify group and the target is a numpy array, then you can use this to slice it. This is useful for opening subsets of large HDF5 files. To compose the selection, you can use deepdish.aslice.
  • unpack (bool) – If True, a single-entry dictionaries will be unpacked and the value will be returned directly. That is, if you save dict(a=100), only 100 will be loaded.
Returns:

data – Hopefully an identical reconstruction of the data that was saved.

Return type:

anything

See also

save()