IO (deepdish.io)

See the Saving and loading data chapter for a tutorial.

deepdish.io.save(path, data, compression='default')[source]

Save any Python structure to an HDF5 file. It is particularly suited for Numpy arrays. This function works similar to numpy.save, except if you save a Python object at the top level, you do not need to issue data.flat[0] to retrieve it from inside a Numpy array of type object.

Some types of objects get saved natively in HDF5. The rest get serialized automatically. For most needs, you should be able to stick to the natively supported types, which are:

  • Dictionaries
  • Short lists and tuples (<256 in length)
  • Basic data types (including strings and None)
  • Numpy arrays
  • Pandas DataFrame, Series, and Panel
  • SimpleNamespaces (for Python >= 3.3, but see note below)

A recommendation is to always convert your data to using only these types That way your data will be portable and can be opened through any HDF5 reader. A class that helps you with this is deepdish.util.Saveable.

Lists and tuples are supported and can contain heterogeneous types. This is mostly useful and plays well with HDF5 for short lists and tuples. If you have a long list (>256) it will be serialized automatically. However, in such cases it is common for the elements to have the same type, in which case we strongly recommend converting to a Numpy array first.

Note that the SimpleNamespace type will be read in as dictionaries for earlier versions of Python.

This function requires the PyTables module to be installed.

You can change the default compression method to blosc (much faster, but less portable) by creating a ~/.deepdish.conf with:

[io]
    compression: blosc

This is the recommended compression method if you plan to use your HDF5 files exclusively through deepdish (or PyTables).

Parameters:
  • path (string) – Filename to which the data is saved.
  • data (anything) – Data to be saved. This can be anything from a Numpy array, a string, an object, or a dictionary containing all of them including more dictionaries.
  • compression (string or tuple) – Set compression method, choosing from blosc, zlib, lzo, bzip2 and more (see PyTables documentation). It can also be specified as a tuple (e.g. ('blosc', 5)), with the latter value specifying the level of compression, choosing from 0 (no compression) to 9 (maximum compression). Set to None to turn off compression. The default is zlib, since it is highly portable; for much greater speed, try for instance blosc.

See also

load()

deepdish.io.load(path, group=None, sel=None, unpack=False)[source]

Loads an HDF5 saved with save.

This function requires the PyTables module to be installed.

Parameters:
  • path (string) – Filename from which to load the data.
  • group (string or list) – Load a specific group in the HDF5 hierarchy. If group is a list of strings, then a tuple will be returned with all the groups that were specified.
  • sel (slice or tuple of slices) – If you specify group and the target is a numpy array, then you can use this to slice it. This is useful for opening subsets of large HDF5 files. To compose the selection, you can use deepdish.aslice.
  • unpack (bool) – If True, a single-entry dictionaries will be unpacked and the value will be returned directly. That is, if you save dict(a=100), only 100 will be loaded.
Returns:

data – Hopefully an identical reconstruction of the data that was saved.

Return type:

anything

See also

save()