Class hdf_file (o2scl_hdf)

O2scl : Class List

class hdf_file

Store data in an compatible HDF5 file.

See also the File I/O with HDF5 section of the o2 User’s guide.

The member functions which write or get data from an HDF file begin with either get or set. Where appropriate, the next character is either c for character, d for double, f for float, or i for int.

By default, vectors and matrices are written to HDF files in a chunked format, so their length can be changed later as necessary. The chunk size is chosen in def_chunk() to be the closest power of 10 to the current vector size.

All files not closed by the user are closed in the destructor, but the destructor does not automatically close groups.

Idea for Future:

This class opens all files in R/W mode, which may cause I/O problems in file systems. This needs to be fixed by allowing the user to open a read-only file. (AWS: 3/16/18 I think this is fixed now.)

The HDF functions do not always consistently choose between throwing exceptions and throwing HDF5 exceptions. Check and/or fix this.

Automatically close groups, e.g. by storing hid_t’s in a stack?

Rewrite the _arr_alloc() functions so that they return a shared_ptr?

Move the code from the ‘filelist’ acol command here into hdf_file.

Note

Currently, HDF I/O functions write data to HDF files assuming that int and float have 4 bytes, while size_t and double are 8 bytes. All output is done in little endian format. While get functions can read data with different sizes or in big endian format, the set functions cannot currently write data this way.

Note

It does make sense to write a zero-length vector to an HDF file if the vector does not have a fixed size in order to create a placeholder for future output. Thus the set_vec() and allow zero-length vectors and the set_arr() functions allow the size_t parameter to be zero, in which case the pointer parameter is ignored. The set_vec_fixed() and set_arr_fixed() functions do not allow this, and will throw an exception if sent a zero-length vector.

Warning

This class is still in development. Because of this, hdf5 files generated by this class may not be easily read by future versions. Later versions of may have stronger guarantees on backwards compatibility.

Mode values for \ref iterate_parms

static const int ip_filelist = 1
static const int ip_name_from_type = 2
static const int ip_type_from_name = 3
static const int ip_type_from_pattern = 4
static const int ip_name_list_from_type = 5
static void type_process(iterate_parms &ip, int mode, size_t ndims, hsize_t dims[100], hsize_t max_dims[100], std::string base_type, std::string name)

Process a type for iterate_func()

static herr_t iterate_func(hid_t loc, const char *name, const H5L_info_t *inf, void *op_data)

HDF object iteration function.

static herr_t iterate_copy_func(hid_t loc, const char *name, const H5L_info_t *inf, void *op_data)

HDF5 object iteration function when copying.

hdf_file(const hdf_file&)
hdf_file &operator=(const hdf_file&)

Open and close files

int open(std::string fname, bool write_access = false, bool err_on_fail = true)

Open a file named fname.

If err_on_fail is true, this calls the error handler if opening the file fails (e.g. because the file does not exist). If err_on_fail is false and opening the file fails, nothing is done and the function returns the value o2scl::exc_efilenotfound. If the open succeeds, this function returns o2scl::success.

void open_or_create(std::string fname)

Open a file named fname or create if it doesn’t already exist.

void close()

Close the file.

Manipulate ids

hid_t get_file_id()

Get the current file id.

void set_current_id(hid_t cur)

Set the current working id.

hid_t get_current_id()

Retrieve the current working id.

Simple get functions

If the specified object is not found, the error handler will be called.

int getc(std::string name, char &c)

Get a character named name.

int getd(std::string name, double &d)

Get a double named name.

int getf(std::string name, float &f)

Get a float named name.

int geti(std::string name, int &i)

Get a integer named name.

int get_szt(std::string name, size_t &u)

Get an unsigned integer named name.

int gets(std::string name, std::string &s)

Get a string named name.

Note

Strings are stored as character arrays and thus retrieving a string from a file requires loading the information from the file into a character array, and then copying it to the string. This will be slow for very long strings.

int gets_var(std::string name, std::string &s)

Get a variable length string named name.

int gets_fixed(std::string name, std::string &s)

Get a fixed-length string named name.

int gets_def_fixed(std::string name, std::string def, std::string &s)

Get a fixed-length string named name with default value s.

Simple set functions

void setc(std::string name, char c)

Set a character named name to value c.

void setd(std::string name, double d)

Set a double named name to value d.

void setf(std::string name, float f)

Set a float named name to value f.

void seti(std::string name, int i)

Set an integer named name to value i.

void set_szt(std::string name, size_t u)

Set an unsigned integer named name to value u.

void sets(std::string name, std::string s)

Set a string named name to value s.

The string is stored in the HDF file as an extensible character array rather than a string.

void sets_fixed(std::string name, std::string s)

Set a fixed-length string named name to value s.

This function stores s as a fixed-length string in the HDF file. If a dataset named name is already present, then s must not be longer than the string length already specified in the HDF file.

Generic floating point I/O

template<class fp_t>
inline int setfp_copy(std::string name, fp_t &f)

Set a generic floating point named name to value f.

template<class vec_fp_t>
inline int setfp_vec_copy(std::string name, vec_fp_t &f)

Set a generic floating point named name to value f.

template<class fp_t>
inline int getfp_copy(std::string name, fp_t &f)

Get a generic floating point named name.

Warning

No checks are made to ensure that the stored precision matches the precision of the floating point which is used.

inline int getfp_copy(std::string name, long double &f)

Get a long double named name.

Warning

No checks are made to ensure that the stored precision matches the precision of the floating point which is used. Note that the precision of the long double type is also not platform-independent.

template<size_t N>
inline int getfp_copy(std::string name, boost::multiprecision::number<boost::multiprecision::cpp_dec_float<N>> &f)

Get a boost multiprecision floating point named name (specialization for Boost multiprecision numbers)

Warning

No checks are made to ensure that the stored precision matches the precision of the floating point which is used.

template<class vec_fp_t>
inline int getfp_vec_copy(std::string name, vec_fp_t &f)

Get a generic floating point named name.

Warning

No checks are made to ensure that the stored precision matches the precision of the floating point which is used.

template<size_t N>
inline int getfp_vec_copy(std::string name, std::vector<boost::multiprecision::number<boost::multiprecision::cpp_dec_float<N>>> &f)

Get a generic floating point named name (specialization for Boost multiprecision numbers)

Warning

No checks are made to ensure that the stored precision matches the precision of the floating point which is used.

Group manipulation

hid_t open_group(hid_t init_id, std::string path)

Open a group relative to the location specified in init_id.

Note

In order to ensure that future objects are written to the newly-created group, the user must use set_current_id() using the newly-created group ID for the argument.

hid_t open_group(std::string path)

Open a group relative to the current location.

Note

In order to ensure that future objects are written to the newly-created group, the user must use set_current_id() using the newly-created group ID for the argument.

inline int close_group(hid_t group)

Close a previously created group.

Vector get functions

These functions automatically free any previously allocated memory in v and then allocate the proper space required to read the information from the HDF file.

int getd_vec(std::string name, std::vector<double> &v)

Get vector dataset and place data in v.

template<class vec_t>
inline int getd_vec_copy(std::string name, vec_t &v)

Get vector dataset and place data in v.

This works with any vector class which has a resize() method.

Idea for Future:

This currently requires a copy, but there may be a way to write a new version which does not.

int geti_vec(std::string name, std::vector<int> &v)

Get vector dataset and place data in v.

template<class vec_int_t>
inline int geti_vec_copy(std::string name, vec_int_t &v)

Get vector dataset and place data in v.

Idea for Future:

This currently requires a copy, but there may be a way to write a new version which does not.

int get_szt_vec(std::string name, std::vector<size_t> &v)

Get vector dataset and place data in v.

template<class vec_size_t>
inline int get_szt_vec_copy(std::string name, vec_size_t &v)

Get vector dataset and place data in v.

Idea for Future:

This currently requires a copy, but there may be a way to write a new version which does not.

int gets_vec_copy(std::string name, std::vector<std::string> &s)

Get a vector of strings named name and store it in s.

int gets_vec_vec_copy(std::string name, std::vector<std::vector<std::string>> &s)

Get a vector of a vector of strings named name and store it in s.

int getd_vec_vec_copy(std::string name, std::vector<std::vector<double>> &s)

Get a vector of a vector of strings named name and store it in s.

Vector set functions

These functions automatically write all of the vector elements to the HDF file, if necessary extending the data that is already present.

int setd_vec(std::string name, const std::vector<double> &v)

Set vector dataset named name with v.

template<class vec_t>
inline int setd_vec_copy(std::string name, const vec_t &v)

Set vector dataset named name with v.

This requires a copy before the vector is written to the file.

int seti_vec(std::string name, const std::vector<int> &v)

Set vector dataset named name with v.

template<class vec_int_t>
inline int seti_vec_copy(std::string name, vec_int_t &v)

Set vector dataset named name with v.

This requires a copy before the vector is written to the file.

int set_szt_vec(std::string name, const std::vector<size_t> &v)

Set vector dataset named name with v.

template<class vec_size_t>
inline int set_szt_vec_copy(std::string name, const vec_size_t &v)

Set vector dataset named name with v.

This requires a copy before the vector is written to the file.

int sets_vec_copy(std::string name, const std::vector<std::string> &s)

Set a vector of strings named name.

Developer note: String vectors are reformatted as a single character array, in order to allow each string to have different length and to make each string extensible. The size of the vector s is stored as an integer named nw.

Warning

This function copies the data in the vector of strings to a new string before writing the data to the HDF5 file and thus may be less useful for larger vectors or vectors which contain longer strings.

int sets_vec_vec_copy(std::string name, const std::vector<std::vector<std::string>> &s)

Set a vector of vectors of strings named name.

Developer note: String vectors are reformatted as a single character array, in order to allow each string to have different length and to make each string extensible. The size of the vector s is stored as an integer named nw.

(experimental)

Warning

This function copies the data in the vector of strings to a new string before writing the data to the HDF5 file and thus may be less useful for larger vectors or vectors which contain longer strings.

int setd_vec_vec_copy(std::string name, const std::vector<std::vector<double>> &vvd)

Set a vector of vectors named name.

Matrix get functions

These functions automatically free any previously allocated memory in m and then allocate the proper space required to read the information from the HDF file.

int getd_mat_copy(std::string name, ubmatrix &m)

Get matrix dataset and place data in m.

int geti_mat_copy(std::string name, ubmatrix_int &m)

Get matrix dataset and place data in m.

Matrix set functions

These functions automatically write all of the vector elements to the HDF file, if necessary extending the data that is already present.

int setd_mat_copy(std::string name, const ubmatrix &m)

Set matrix dataset named name with m.

int seti_mat_copy(std::string name, const ubmatrix_int &m)

Set matrix dataset named name with m.

template<class arr2d_t>
inline int setd_arr2d_copy(std::string name, size_t r, size_t c, const arr2d_t &a2d)

Set a two-dimensional array dataset named name with m.

template<class arr2d_t>
inline int seti_arr2d_copy(std::string name, size_t r, size_t c, const arr2d_t &a2d)

Set a two-dimensional array dataset named name with m.

template<class arr2d_t>
inline int set_szt_arr2d_copy(std::string name, size_t r, size_t c, const arr2d_t &a2d)

Set a two-dimensional array dataset named name with m.

Tensor I/O functions

int getd_ten(std::string name, o2scl::tensor<double, std::vector<double>, std::vector<size_t>> &t)

Get a tensor of double-precision numbers from an HDF file.

This version does not require a full copy of the tensor.

int geti_ten(std::string name, o2scl::tensor<int, std::vector<int>, std::vector<size_t>> &t)

Get a tensor of integers from an HDF file.

This version does not require a full copy of the tensor.

int get_szt_ten(std::string name, o2scl::tensor<size_t, std::vector<size_t>, std::vector<size_t>> &t)

Get a tensor of size_t from an HDF file.

This version does not require a full copy of the tensor.

template<class vec_t, class vec_size_t>
inline int getd_ten_copy(std::string name, o2scl::tensor<double, vec_t, vec_size_t> &t)

Get a tensor of double-precision numbers from an HDF file.

This version requires a full copy of the tensor from the HDF5 file into the o2scl::tensor object.

template<class vec_t, class vec_size_t>
inline int geti_ten_copy(std::string name, o2scl::tensor<int, vec_t, vec_size_t> &t)

Get a tensor of integers from an HDF file.

This version requires a full copy of the tensor from the HDF5 file into the o2scl::tensor object.

int setd_ten(std::string name, const o2scl::tensor<double, std::vector<double>, std::vector<size_t>> &t)

Write a tensor of double-precision numbers to an HDF file.

You may overwrite a tensor already present in the HDF file only if it has the same rank. This version does not require a full copy of the tensor.

int seti_ten(std::string name, const o2scl::tensor<int, std::vector<int>, std::vector<size_t>> &t)

Write a tensor of integers to an HDF file.

You may overwrite a tensor already present in the HDF file only if it has the same rank. This version does not require a full copy of the tensor.

int set_szt_ten(std::string name, const o2scl::tensor<size_t, std::vector<size_t>, std::vector<size_t>> &t)

Write a tensor of integers to an HDF file.

You may overwrite a tensor already present in the HDF file only if it has the same rank. This version does not require a full copy of the tensor.

template<class vec_t, class vec_size_t>
inline int setd_ten_copy(std::string name, const o2scl::tensor<double, std::vector<double>, std::vector<size_t>> &t)

Write a tensor of double-precision numbers to an HDF file.

You may overwrite a tensor already present in the HDF file only if it has the same rank. This version requires a full copy of the tensor from the o2scl::tensor object into the HDF5 file.

template<class vec_t, class vec_size_t>
inline int seti_ten_copy(std::string name, const o2scl::tensor<int, std::vector<int>, std::vector<size_t>> &t)

Write a tensor of integers to an HDF file.

You may overwrite a tensor already present in the HDF file only if it has the same rank. This version requires a full copy of the tensor from the o2scl::tensor object into the HDF5 file.

Array get functions

All of these functions assume that the pointer allocated beforehand, and matches the size of the array in the HDF file. If the specified object is not found, the error handler will be called.

int getc_arr(std::string name, size_t n, char *c)

Get a character array named name of size n.

Note

The pointer c must be allocated beforehand to hold n entries, and n must match the size of the array in the HDF file.

int getd_arr(std::string name, size_t n, double *d)

Get a double array named name of size n.

Note

The pointer d must be allocated beforehand to hold n entries, and n must match the size of the array in the HDF file.

int getd_arr_compr(std::string name, size_t n, double *d, int &compr)

Get a double array named name of size n and put the compression type in compr.

Note

The pointer d must be allocated beforehand to hold n entries, and n must match the size of the array in the HDF file.

int getf_arr(std::string name, size_t n, float *f)

Get a float array named name of size n.

Note

The pointer f must be allocated beforehand to hold n entries, and n must match the size of the array in the HDF file.

int geti_arr(std::string name, size_t n, int *i)

Get an integer array named name of size n.

Note

The pointer i must be allocated beforehand to hold n entries, and n must match the size of the array in the HDF file.

Array get functions with memory allocation

These functions allocate memory with new, which should be freed by the user with delete .

int getc_arr_alloc(std::string name, size_t &n, char *c)

Get a character array named name of size n.

int getd_arr_alloc(std::string name, size_t &n, double *d)

Get a double array named name of size n.

int getf_arr_alloc(std::string name, size_t &n, float *f)

Get a float array named name of size n.

int geti_arr_alloc(std::string name, size_t &n, int *i)

Get an integer array named name of size n.

Array set functions

int setc_arr(std::string name, size_t n, const char *c)

Set a character array named name of size n to value c.

int setd_arr(std::string name, size_t n, const double *d)

Set a double array named name of size n to value d.

int setf_arr(std::string name, size_t n, const float *f)

Set a float array named name of size n to value f.

int seti_arr(std::string name, size_t n, const int *i)

Set a integer array named name of size n to value i.

int set_szt_arr(std::string name, size_t n, const size_t *u)

Set a integer array named name of size n to value i.

Fixed-length array set functions

If a dataset named name is already present, then the user-specified array must not be longer than the array already present in the HDF file.

int setc_arr_fixed(std::string name, size_t n, const char *c)

Set a character array named name of size n to value c.

int setd_arr_fixed(std::string name, size_t n, const double *c)

Set a double array named name of size n to value d.

int setf_arr_fixed(std::string name, size_t n, const float *f)

Set a float array named name of size n to value f.

int seti_arr_fixed(std::string name, size_t n, const int *i)

Set an integer array named name of size n to value i.

Get functions with default values

If the requested dataset is not found in the HDF file, the object is set to the specified default value and the error handler is not called.

int getc_def(std::string name, char def, char &c)

Get a character named name.

int getd_def(std::string name, double def, double &d)

Get a double named name.

int getf_def(std::string name, float def, float &f)

Get a float named name.

int geti_def(std::string name, int def, int &i)

Get a integer named name.

int get_szt_def(std::string name, size_t def, size_t &i)

Get a size_t named name.

int gets_def(std::string name, std::string def, std::string &s)

Get a string named name.

int gets_var_def(std::string name, std::string def, std::string &s)

Get a variable length string named name.

Get functions with pre-allocated pointer

int getd_vec_prealloc(std::string name, size_t n, double *d)

Get a double array d pre-allocated to have size n.

int geti_vec_prealloc(std::string name, size_t n, int *i)

Get an integer array i pre-allocated to have size n.

int getd_mat_prealloc(std::string name, size_t n, size_t m, double *d)

Get a double matrix d pre-allocated to have size (n,m)

int geti_mat_prealloc(std::string name, size_t n, size_t m, int *i)

Get an integer matrix i pre-allocated to have size (n,m)

Find a group

int find_object_by_type(std::string type, std::string &name, bool use_regex = false, int verbose = 0)

Look in hdf_file hf for an object of type type and if found, set name to the associated object name.

This function returns 0 if an object of type type is found and o2scl::exc_enoprog if it fails.

int list_objects_by_type(std::string type, std::vector<std::string> &vs, bool use_regex = false, int verbose = 0)

Find all objects in hdf_file hf of type type and store the names in vs.

This function returns 0 if an object of type type is found and o2scl::exc_enoprog if it fails.

int find_object_by_name(std::string name, std::string &type, bool use_regex = false, int verbose = 0)

Look in hdf_file hf for an object with name name and if found, set type to the associated type.

This function returns 0 if an object with name name is found and o2scl::exc_enoprog if it fails.

int find_object_by_pattern(std::string name, std::string &type, bool use_regex = false, int verbose = 0)

Look in hdf_file hf for an object with name which matches a regular expression.

If an object is found, type is set to the associated type. This function returns 0 if an object with name name is found and o2scl::exc_enoprog if it fails.

void file_list(bool use_regex = false, int verbose = 0)

List datasets and objects in the top-level of the file.

void copy(int verbose, hdf_file &hf2)

Create a copy of the current HDF5 file and place the copy in hf2.

Public Types

typedef boost::numeric::ublas::vector<double> ubvector
typedef boost::numeric::ublas::matrix<double> ubmatrix
typedef boost::numeric::ublas::vector<int> ubvector_int
typedef boost::numeric::ublas::matrix<int> ubmatrix_int

Public Functions

hdf_file()
virtual ~hdf_file()
inline bool has_write_access()

If true, then the file has read and write access.

Public Members

int compr_type

Compression type (support experimental)

size_t min_compr_size

Minimum size to compress by default.

Protected Functions

inline virtual hsize_t def_chunk(size_t n)

Default chunk size.

Choose the closest power of 10 which is greater than or equal to 10 and less than or equal to \( 10^6 \).

Protected Attributes

hid_t file

File ID.

bool file_open

True if a file has been opened.

hid_t current

Current file or group location.

bool write_access

If true, then the file has read and write access.

struct iterate_copy_parms

Parameters for iterate_copy_func()

Public Members

o2scl_hdf::hdf_file *hf

Pointer to source HDF5 file.

o2scl_hdf::hdf_file *hf2

Pointer to destination HDF5 file.

int verbose

Verbosity parameter.

struct iterate_parms

Parameters for iterate_func()

Public Members

std::string tname

Object name.

o2scl_hdf::hdf_file *hf

Pointer to HDF5 file.

bool found

True if found.

std::string type

Object type.

int verbose

Verbose parameter.

int mode

Iteration mode, either ip_filelist, ip_name_from_type, ip_type_from_name or ip_type_from_pattern.

bool use_regex

If true, then use regex to match names.

std::vector<std::string> name_list

The list of names, used by list_objects_of_type()