19 Rcpp

19.1 Getting started with C++

  1. Q: With the basics of C++ in hand, it’s now a great time to practice by reading and writing some simple C++ functions. For each of the following functions, read the code and figure out what the corresponding base R function is. You might not understand every part of the code yet, but you should be able to figure out the basics of what the function does.

    A: The R equivalents are:

    • f1: mean()
    • f2: cumsum()
    • f3: any()
    • f4: Position()
    • f5: pmin()
  2. Q: To practice your function writing skills, convert the following functions into C++. For now, assume the inputs have no missing values.

    1. all()
    1. cumprod(), cummin(), cummax().
    1. diff(). Start by assuming lag 1, and then generalise for lag n.
    1. range.
    1. var. Read about the approaches you can take on wikipedia. Whenever implementing a numerical algorithm, it’s always good to check what is already known about the problem.

19.2 Missing values

  1. Q: Rewrite any of the functions from the first exercise to deal with missing values. If na.rm is true, ignore the missing values. If na.rm is false, return a missing value if the input contains any missing values. Some good functions to practice with are min(), max(), range(), mean(), and var().

    A:

    NumericVector minC(NumericVector x, bool narm){
      int n = x.size();
      LogicalVector index = is_na(x);
      NumericVector omin(1);
      bool na_check = false;
      bool na_check_all = true;
    
      for(int i = 0; i<n; i++){
          if (index[i]) na_check = true;
      }
    
      for(int i = 0; i<n; i++){
          if (!index[i]) na_check_all = false;
      }
    
      if (narm) {
          for(int i = n-1; i >= 0; i--){
              if (!index[i]) omin[0] = x[i];
          }
    
          for(int i = 1; i < n; i++) {
              if (!index[i]) omin[0] = std::min(x[i], omin[0]);
          }
    
          if (na_check_all) {
              omin[0] = NA_REAL;
          }
      } else if (na_check) {
          omin[0] = NA_REAL;
      } else {
          omin[0] = x[0];
          for(int i = 1; i < n; i++){
              omin = std::min(x[i], omin[0]);
          }
      }
    
      return omin;
    }
    
    NumericVector maxC(NumericVector x, bool narm){
      int n = x.size();
      LogicalVector index = is_na(x);
      NumericVector omax(1);
      bool na_check = false;
      bool na_check_all = true;
    
      for(int i = 0; i<n; i++){
          if (index[i]) na_check = true;
      }
    
      for(int i = 0; i<n; i++){
          if (!index[i]) na_check_all = false;
      }
    
      if (narm) {
          for(int i = n-1; i >= 0; i--){
              if (!index[i]) omax[0] = x[i];
          }
    
          for(int i = 1; i < n; i++) {
              if (!index[i]) omax[0] = std::max(x[i], omax[0]);
          }
    
          if (na_check_all) {
              omax[0] = NA_REAL;
          }
      } else if (na_check) {
          omax[0] = NA_REAL;
      } else {
          omax[0] = x[0];
          for(int i = 1; i < n; i++){
              omax = std::max(x[i], omax[0]);
          }
      }
    
      return omax;
    }
    
    NumericVector rangeC(NumericVector x, bool narm){
      NumericVector out(2);
      NumericVector omin(1);
      NumericVector omax(1);
    
      omin = minC(x, narm);
      omax = maxC(x, narm);
    
      out[0] = omin[0];
      out[1] = omax[0];
      return out;
      }
    }
    
    NumericVector meanC(NumericVector x, bool narm){
      int n = x.size();
      LogicalVector index = is_na(x);
      bool na_check = false;
      bool na_check_all = true;
      int n_corrected = 0;
      NumericVector out(1);
    
      for(int i = 0; i<n; i++){
          if (index[i]) na_check = true;
      }
    
      for(int i = 0; i<n; i++){
          if (!index[i]) na_check_all = false;
      }
    
      for(int i = 0; i<n; i++){
          if (!index[i]) n_corrected++;
      }
    
      /* narm = T */
      if (narm){
          if (na_check_all) {
              out[0] = NA_REAL;
          }  else {
              out[0] = 0;
              for(int i = 0; i < n; ++i) {
                  if (!index[i]) out[0] += x[i] / n_corrected;
              }
          }
      }
    
      /* narm = F */
      if (!narm){
          if (na_check) {
              out[0] = NA_REAL;
          } else {
              for(int i = 0; i < n; ++i) {
                  out[0] += x[i] / n;
                }
          }
      }
    
      return out;
    }
  2. Q: Rewrite cumsum() and diff() so they can handle missing values. Note that these functions have slightly more complicated behaviour.

    A:

19.3 The STL

To practice using the STL algorithms and data structures, implement the following using R functions in C++, using the hints provided:

  1. median.default() using partial_sort.

  2. %in% using unordered_set and the find() or count() methods.

  3. unique() using an unordered_set (challenge: do it in one line!).

  4. min() using std::min(), or max() using std::max().

  5. which.min() using min_element, or which.max() using max_element.

  6. setdiff(), union(), and intersect() for integers using sorted ranges and set_union, set_intersection and set_difference.