25 Rewriting R code in C++

25.1 Getting started with C++

  1. Q: With the basics of C++ in hand, it’s now a great time to practice by reading and writing some simple C++ functions. For each of the following functions, read the code and figure out what the corresponding base R function is. You might not understand every part of the code yet, but you should be able to figure out the basics of what the function does.

    A: The R equivalents are:

    • f1: mean()
    • f2: cumsum()
    • f3: any()
    • f4: Position()
    • f5: pmin()
  2. Q: To practice your function writing skills, convert the following functions into C++. For now, assume the inputs have no missing values.

    A:

    1. all()
    1. cumprod(), cummin(), cummax().
    1. diff(). Start by assuming lag 1, and then generalise for lag n.
    1. range.
    1. var. Read about the approaches you can take on wikipedia. Whenever implementing a numerical algorithm, it’s always good to check what is already known about the problem.

25.2 Missing values

  1. Q: Rewrite any of the functions from the first exercise to deal with missing values. If na.rm is true, ignore the missing values. If na.rm is false, return a missing value if the input contains any missing values. Some good functions to practice with are min(), max(), range(), mean(), and var().

    A:

    NumericVector minC(NumericVector x, bool narm){
      int n = x.size();
      LogicalVector index = is_na(x);
      NumericVector omin(1);
      bool na_check = false;
      bool na_check_all = true;
    
      for(int i = 0; i<n; i++){
          if (index[i]) na_check = true;
      }
    
      for(int i = 0; i<n; i++){
          if (!index[i]) na_check_all = false;
      }
    
      if (narm) {
          for(int i = n-1; i >= 0; i--){
              if (!index[i]) omin[0] = x[i];
          }
    
          for(int i = 1; i < n; i++) {
              if (!index[i]) omin[0] = std::min(x[i], omin[0]);
          }
    
          if (na_check_all) {
              omin[0] = NA_REAL;
          }
      } else if (na_check) {
          omin[0] = NA_REAL;
      } else {
          omin[0] = x[0];
          for(int i = 1; i < n; i++){
              omin = std::min(x[i], omin[0]);
          }
      }
    
      return omin;
    }
    
    NumericVector maxC(NumericVector x, bool narm){
      int n = x.size();
      LogicalVector index = is_na(x);
      NumericVector omax(1);
      bool na_check = false;
      bool na_check_all = true;
    
      for(int i = 0; i<n; i++){
          if (index[i]) na_check = true;
      }
    
      for(int i = 0; i<n; i++){
          if (!index[i]) na_check_all = false;
      }
    
      if (narm) {
          for(int i = n-1; i >= 0; i--){
              if (!index[i]) omax[0] = x[i];
          }
    
          for(int i = 1; i < n; i++) {
              if (!index[i]) omax[0] = std::max(x[i], omax[0]);
          }
    
          if (na_check_all) {
              omax[0] = NA_REAL;
          }
      } else if (na_check) {
          omax[0] = NA_REAL;
      } else {
          omax[0] = x[0];
          for(int i = 1; i < n; i++){
              omax = std::max(x[i], omax[0]);
          }
      }
    
      return omax;
    }
    
    NumericVector rangeC(NumericVector x, bool narm){
      NumericVector out(2);
      NumericVector omin(1);
      NumericVector omax(1);
    
      omin = minC(x, narm);
      omax = maxC(x, narm);
    
      out[0] = omin[0];
      out[1] = omax[0];
      return out;
      }
    }
    
    NumericVector meanC(NumericVector x, bool narm){
      int n = x.size();
      LogicalVector index = is_na(x);
      bool na_check = false;
      bool na_check_all = true;
      int n_corrected = 0;
      NumericVector out(1);
    
      for(int i = 0; i<n; i++){
          if (index[i]) na_check = true;
      }
    
      for(int i = 0; i<n; i++){
          if (!index[i]) na_check_all = false;
      }
    
      for(int i = 0; i<n; i++){
          if (!index[i]) n_corrected++;
      }
    
      /* narm = T */
      if (narm){
          if (na_check_all) {
              out[0] = NA_REAL;
          }  else {
              out[0] = 0;
              for(int i = 0; i < n; ++i) {
                  if (!index[i]) out[0] += x[i] / n_corrected;
              }
          }
      }
    
      /* narm = F */
      if (!narm){
          if (na_check) {
              out[0] = NA_REAL;
          } else {
              for(int i = 0; i < n; ++i) {
                  out[0] += x[i] / n;
                }
          }
      }
    
      return out;
    }
  2. Q: Rewrite cumsum() and diff() so they can handle missing values. Note that these functions have slightly more complicated behaviour.

    A:

25.3 The STL

To practice using the STL algorithms and data structures, implement the following using R functions in C++, using the hints provided:

  1. Q: median.default() using partial_sort.

    A:

  2. Q: %in% using unordered_set and the find() or count() methods.

    A:

  3. Q: unique() using an unordered_set (challenge: do it in one line!).

    A:

  4. Q: min() using std::min(), or max() using std::max().

    A:

  5. Q: which.min() using min_element, or which.max() using max_element.

    A:

  6. Q: setdiff(), union(), and intersect() for integers using sorted ranges and set_union, set_intersection and set_difference.

    A: