Arrays, Vectors, Matrices and Tensors

O2scl

Arrays, vectors, matrices and tensors contents

Vector and matrix introduction

Many useful vector and matrix objects are defined elsewhere, thus O₂scl does not include native vector and matrix classes. Internally, O₂scl uses std::vector, Boost uBLAS vector and matrix objects: boost::numeric::ublas::vector<>, boost::numeric::ublas::matrix<>, and other related class templates. Many O₂scl routines are templates which are compatible with a wide range of vector and matrix types. See the Multi-dimensional solver example which shows how an O₂scl class can be used with Boost, Eigen, or Armadillo objects.

The O₂scl library uses a standard nomenclature to distinguish a couple different concepts. The word “array” is typically used to refer to C-style arrays, i.e. double[]. If there are two dimensions in the array, it is a “two-dimensional array”, i.e. double[][] . The word “vector” is reserved generic objects with array-like semantics.

In general, there are many vector types (STL, Boost, etc.) and they can be characterized by whether or not they satisfy certain “concepts” like DefaultConstructible. O₂scl classes which operate on vector types are designed to be as flexible as possible, so that they can be used with almost any vector type. Eventually, all O₂scl classes with template vector and matrix types should specify exactly which concepts are required to be satisified, but this is still in progress.

The word “matrix” is reserved for the a generic object which has matrix-like semantics and can be accessed using operator(,). C++ matrix types typically prefer operator(,) over operator[][]. This is because operator[][] implies the creation of a temporary row object, and it is thus difficult to implement operator[] without incurring an overhead. Nevertheless, some O₂scl classes have separate versions which operate on matrix types which are only accessible with operator[][] (like two-dimensional arrays). See Linear Algebra for examples of this distinction.

With std::function<> and the new lambda function support in C++11, it is important to notice that std::function<double &(size_t,size_t)> is also a matrix type (the ampersand is important unless the matrix is read-only). This means that some matrices (e.g. slices of tensors) can be trivially constructed from std::bind and std::mem_fn. An example of this in O₂scl_eos is how the o2scl::eos_sn_base::slice class generates a matrix from a 3-D tensor.

A matrix type is distinct from a “vector of vectors” or a “list of vectors”, such as that implied by std::vector<std::vector<double> > because in the latter, not all of the vectors in the list need to have the same size. In some cases, There are places where a list of vectors is preferable to a matrix, and O₂scl expects that elements in a list of vectors can be accessed by operator[][]. The function o2scl::tensor_grid::set_grid() accepts a list of vectors, and for this function, none of the vectors needs to have the same size.

The word “tensor” is used for a generic object which has rank n and then has n associated indices. A vector is just a tensor of rank 1 and a matrix is just a tensor of rank 2. Tensors are implemented in O₂scl with class tensor. A multivariate function specified on a grid can be implemented in O₂scl with tensor_grid. See more discussion in the tensor section below.

Rows and columns vs. x and y

The most common convention is that the first index of a matrix is the row index, i.e. to print a matrix to the screen one uses something like:

for(size_t row=0;row<n_rows;row++) {
  for(size_t col=0;col<n_cols;col++) {
    cout << M(row,col) << " ";
  }
  cout << endl;
}

This is the form used in o2scl::matrix_out() and o2scl::array_2d_out(). To reverse the rows and columns use o2scl::matrix_trans_out() and o2scl::array_2d_trans_out().

A related issue is how matrices are stored. In C, two-dimensional arrays are stored in row-major order, and the distance from the first element to the element at (row,col) is given by row*n_cols+col. In row-major order storage, the matrix elements are stored in the same order in which they are output by the functions o2scl::matrix_out() and o2scl::array_2d_out(). The alternative is column-major order where the distance from the first element to the element at (row,col) is given by col*n_rows+row. The tensor class uses a simple generalization of row-major order. O₂scl classes and functions which use operator(,) operate independently of how the data is represented in memory.

Sometimes its useful to think about the rows and columns in a matrix as referring to a elements of a grid, and the matrix indices refer to points in a grid in \((x,y)\). It might seem intuitive to think of a matrix as A[ix][iy] where ix and iy are the \(x\) and \(y\) indices because the ordering of the indices is alphabetical. However, it is useful to note that because functions like o2scl::matrix_out() print the first “row” first rather than the first column, a matrix constructed as A[ix][iy] will be printed out with x on the “vertical axis” and y on the “horizontal axis”, which is backwards from the usual convention when plotting data.

O₂scl classes which interpret matrix data on a grid (table3d, contour, interp2_seq and interp2_direct) use x to denote the row index and y to denote the column index by convention.

Assignment and copying

O₂scl has several functions which perform various operations on generic vector and matrix types. These functions are listed in the next few sections.

Differentiation and integration

Interpolating vectors

Searching

Vector properties

Maximum and minimum functions

Vector rearrangement functions

Statistics functions

Statistics with weighted vectors

Matrix assignment and copying

Matrix properties

Matrix maximum and minimum functions

Matrix searching

Matrix rearrangement functions

Vector and matrix output

For writing generic vectors to a stream, you can use the o2scl::vector_out() functions, which are defined in src/base/vector.h. Pretty matrix output is performed by the o2scl::matrix_out() functions, which are defined in src/base/columnify.h. The matrix output function uses a columnify object to format the output.

Tensors

Tensors of arbitrary rank and size can be stored in the class tensor. Classes tensor1, tensor2, tensor3, and tensor4 are rank-specific versions for 1-, 2-, 3- and 4-rank tensors. For n-dimsional data defined on a grid, tensor_grid provides a space to define a hyper-cubic grid in addition to the the tensor data. This class tensor_grid also provides simple n-dimensional interpolation of the data defined on the specified grid. See File I/O with HDF5 for functions in which provide HDF5 I/O for tensor objects.

Tensor example

/* Example: ex_tensor.cpp
   -------------------------------------------------------------------
   A simple example for tensors. See "License Information" 
   section of the documentation for license information.
*/
#include <o2scl/tensor_grid.h>
#include <o2scl/test_mgr.h>
#include <o2scl/hdf_file.h>
#include <o2scl/hdf_io.h>

using namespace std;
using namespace o2scl;
using namespace o2scl_hdf;

double func(double x, double y, double z) {
  return x*x*x-y+z*z;
}

int main(void) {

  cout.setf(ios::scientific);

  test_mgr t;
  t.set_output_level(1);

  // Create a rank three tensor
  tensor_grid<> m3;
  vector<size_t> i3(3), j3(3);
  i3[0]=4;
  i3[1]=3;
  i3[2]=3;
  m3.resize(3,i3);
  
  // Create and set grid
  std::vector<double> grid;

  grid.push_back(1.0);
  grid.push_back(2.0);
  grid.push_back(3.0);
  grid.push_back(4.0);
  
  grid.push_back(1.0);
  grid.push_back(2.0);
  grid.push_back(3.0);
  
  grid.push_back(1.0);
  grid.push_back(2.0);
  grid.push_back(3.0);
  m3.set_grid_packed(grid);
  
  // Fill the tensor with data
  for(size_t i=0;i<i3[0];i++) {
    for(size_t j=0;j<i3[1];j++) {
      for(size_t k=0;k<i3[2];k++) {
	double x=m3.get_grid(0,i);
	double y=m3.get_grid(1,j);
	double z=m3.get_grid(2,k);
	j3[0]=i;
	j3[1]=j;
	j3[2]=k;
	m3.set(j3,func(x,y,z));
      }
    }
  }
  
  // ----------------------------------------------------------------
  // Demonstrate linear interpolation
  
  cout << "Interpolation: " << endl;
  
  double vals[3]={2.5,2.5,1.5};
  cout << "Exact: " << func(vals[0],vals[1],vals[2])
       << " interpolated: "
       << m3.interp_linear(vals) << endl;
  t.test_rel(func(vals[0],vals[1],vals[2]),
	     m3.interp_linear(vals),2.0e-1,"interp linear");
  
  vals[0]=3.5;
  vals[1]=2.5;
  vals[2]=1.0;
  cout << "Exact: " << func(vals[0],vals[1],vals[2])
       << " interpolated: "
       << m3.interp_linear(vals) << endl;
  t.test_rel(func(vals[0],vals[1],vals[2]),
	     m3.interp_linear(vals),2.0e-1,"interp linear");
  cout << endl;

  // ----------------------------------------------------------------
  // Demonstrate HDF5 I/O for tensors

  // Output to a file
  hdf_file hf;
  hf.open_or_create("ex_tensor_dat.o2");
  hdf_output(hf,m3,"tens_grid_test");
  hf.close();

  // Now read from that file
  tensor_grid<> m3c;
  hf.open("ex_tensor_dat.o2");
  hdf_input(hf,m3c);
  hf.close();

  // Show that the tensor is the same
  t.test_gen(m3==m3c,"tensor equality");

  // ----------------------------------------------------------------
  // Demonstrate tensor rearrangement. Create a new tensor which
  // interpolates the value 1.5 into the second index, sums over the
  // third index, and interpolates a new grid for the first index.
  // The result will be a rank 1 tensor with 10 entries. 

  cout << "grid(0),interp(1),sum(2):" << endl;
  tensor_grid<> m3r=grid_rearrange_and_copy<tensor_grid<>,double>
    (m3,{ix_grid(0,1.0,4.0,9),ix_interp(1,1.5),ix_sum(2)});
  
  t.test_gen(m3r.get_rank()==1,"rearrange 1");
  t.test_gen(m3r.get_size(0)==10,"rearrange 2");
  
  t.test_gen(m3r.get_rank()==1,"rearrange 1");
  t.test_gen(m3r.get_size(0)==10,"rearrange 2");
  
  for(size_t i=0;i<10;i++) {
    vector<size_t> ix={i};
    cout << i << " " << m3r.get(ix) << " ";
    cout << (func(1.0+((double)i)/3.0,1.5,1.0)+
             func(1.0+((double)i)/3.0,1.5,2.0)+
             func(1.0+((double)i)/3.0,1.5,3.0)) << endl;
    t.test_rel(m3r.get(ix),(func(1.0+((double)i)/3.0,1.5,1.0)+
                            func(1.0+((double)i)/3.0,1.5,2.0)+
                            func(1.0+((double)i)/3.0,1.5,3.0)),
               3.0e-1,"tensor rearrange 1");
  }
  cout << endl;
  
  cout << "interp(1),grid(0),sum(2):" << endl;
  tensor_grid<> m3s=grid_rearrange_and_copy<tensor_grid<>,double>
    (m3,{ix_interp(1,1.5),ix_grid(0,1.0,4.0,9),ix_sum(2)});
  
  for(size_t i=0;i<10;i++) {
    vector<size_t> ix={i};
    cout << i << " " << m3s.get(ix) << " ";
    cout << (func(1.0+((double)i)/3.0,1.5,1.0)+
             func(1.0+((double)i)/3.0,1.5,2.0)+
             func(1.0+((double)i)/3.0,1.5,3.0)) << endl;
    t.test_rel(m3s.get(ix),(func(1.0+((double)i)/3.0,1.5,1.0)+
                            func(1.0+((double)i)/3.0,1.5,2.0)+
                            func(1.0+((double)i)/3.0,1.5,3.0)),
               3.0e-1,"tensor rearrange 2");
  }
  cout << endl;
  
  t.report();

  return 0;
}

I/O and contiguous storage

O₂scl uses HDF5 for file I/O, and in order to perform I/O of vector-like data, HDF5 works with bare pointers. In order to efficiently read and write vectors and other objects to HDF5 files, it is thus important to ensure that these objects are stored contiguously in memory. The standard template library objects, e.g. std::vector have this property as part of the recent C++ standard. The ublas objects, so far as I know, do not necessarily have this property. For this reason, o2scl_hdf::hdf_file::getd_vec() and o2scl_hdf::hdf_file::setd_vec() are efficient when working with std::vector objects. For other vector types, one must use o2scl_hdf::hdf_file::getd_vec_copy() or o2scl_hdf::hdf_file::setd_vec_copy() which require an extra copy upon reading from and writing to an HDF5 file. The same holds for matrix and tensor I/O. It is the efficiency of this I/O which motivated the default choice of std::vector objects as the default vector type in table and tensor. Also because of this issue, O₂scl does not currently provide HDF I/O functions for tensor classes unless they are built upon std::vector.