3 Vectors

3.1 Atomic vectors

  1. Q: How do you create scalars of type raw and complex? (See ?raw and ?complex)

A: In R scalars are represented as vectors of length one. For raw and complex types these can be created via raw() and complex(), i.e.:

#> [1] 00
#> [1] 0+0i

Raw vectors can easily be created from numeric or character values.

#> [1] 2a
#> [1] 41

For complex numbers real and imaginary parts may be provided directly.

complex(length.out = 1, real = 1, imaginary = 1)
#> [1] 1+1i
  1. Q: Test your knowledge of vector coercion rules by predicting the output of the following uses of c():

  2. Q: Why is 1 == "1" true? Why is -1 < FALSE true? Why is "one" < 2 false?

    A: These comparisons are carried out by operator-functions, which coerce their arguments to a common type. In the examples above these cases will be character, double and character: 1 will be coerced to "1", FALSE is represented as 0 and 2 turns into "2" (and numerals precede letters in the lexicographic order (may depend on locale)).

  3. Q: Why is the default missing value, NA, a logical vector? What’s special about logical vectors? (Hint: think about c(FALSE, NA_character_).)

    A: The presence of missing values shouldn´t affect the type of an object. Recall that there is a type-hierarchy for coercion from character >> double >> integer >> logical. When combining NAs with other atomic types, the NAs will be coerced to integer (NA_integer_), double (NA_real_) or character (NA_character_) and not the other way round. If NA was a character and added to a set of other values all of these would be coerced to character as well.

  4. Q: Precisely what do is.atomic(), is.numeric(), and is.vector() test for?

    A: The documentation states that:
    • is.atomic() tests if is an atomic vector (as defined in Advanced R) or is NULL (!).
    • is.numeric() tests if an object has type integer or double and is not of "factor", "Date", "POSIXt" or "difftime" class.
    • is.vector() tests if an object is vector (as defined in Advanced R) and has no attributes, apart from names.

3.2 Attributes

  1. Q: How is setNames() implemented? How is unname() implemented? Read the source code.

    A: setNames() is implemented as:

    Because the data argument comes first setNames() also works well with the magrittr-pipe operator. When no first argument is given, the result is a named vector (this is rather untypical - required arguments usually come first):

unname() is implemented in the following way:

unname <- function (obj, force = FALSE){
  if (!is.null(names(obj))) 
    names(obj) <- NULL
  if (!is.null(dimnames(obj)) && (force || !is.data.frame(obj))) 
    dimnames(obj) <- NULL

unname() removes existing names (or dimnames) by setting them to NULL.

  1. Q: What does dim() return when applied to a 1d vector? When might you use NROW() or NCOL()?

    A: From ?nrow:

    dim() will return NULL when applied to a 1d vector.

    One may want to use NROW() or NCOL() to handle atomic vectors, lists and NULL values in the same way as one column matrices or data frames. For these objects nrow() and ncol() return NULL.

  2. Q: How would you describe the following three objects? What makes them different to 1:5?

    A: These are all “one dimensional”. If you imagine a 3d cube, x1 is in “x” dimension, x2 is in the “y” dimension, and x3 is in the “z” dimension.

  3. Q: An early draft used this code to illustrate structure():

    But when you print that object you don’t see the comment attribute. Why? Is the attribute missing, or is there something else special about it? (Hint: try using help.)

    A: The documentation states (see ?comment):

    Contrary to other attributes, the comment is not printed (by print or print.default).

    Also, from ?attributes:

    Note that some attributes (namely class, comment, dim, dimnames, names, row.names and tsp) are treated specially and have restrictions on the values which can be set.

    We can retrieve comment attributes by calling them explicitly:

3.3 S3 atomic vectors

  1. Q: What sort of object does table() return? What is its type? What attributes does it have? How does the dimensionality change as you tabulate more variables?

A: table() returns a contingency table of its input variables, which has the class "table". Internally it is represented as an array (implicit class) of integers (type) with the attributes dim (dimension of the underlying array) and dimnames (one name for each input column). The dimensions correspond to the number of unique values (factor levels) in each input variable.

x <- table(mtcars[c("vs", "cyl", "am")])

#> [1] "integer"
#> $dim
#> [1] 2 3 2
#> $dimnames
#> $dimnames$vs
#> [1] "0" "1"
#> $dimnames$cyl
#> [1] "4" "6" "8"
#> $dimnames$am
#> [1] "0" "1"
#> $class
#> [1] "table"
  1. Q: What happens to a factor when you modify its levels?

    A: The underlying integer values stay the same, but the levels are changed, making it look like the data as changed.

  2. Q: What does this code do? How do f2 and f3 differ from f1?

    A: For f2 and f3 either the order of the factor elements or its levels are being reversed. For f1 both transformations are occurring.

3.4 Lists

  1. Q: List all the ways that a list differs from an atomic vector.

    A: To summarise:
    • Atomic vectors are always homogeneous (all elements must be of the same type). Lists may be heterogeneous (the elements can be of different types).
    • Atomic vectors point to one address in memory, while lists contain a separate references for each element.
    • Subsetting with out of bound values or NAs leads to NAs for atomics and NULL values for lists.
  2. Q: Why do you need to use unlist() to convert a list to an atomic vector? Why doesn’t as.vector() work?

    A: A list is already a vector, though not an atomic one!

    Note that as.vector() and is.vector() use different defintions of “vector”!

  3. Q: Compare and contrast c() and unlist() when combining a date and date-time into a single vector.

    A: Date and date-time objects are built upon doubles. Dates are represented as days, while date-time-objects (POSIXct) represent seconds (counted in respect to the reference date 1970-01-01, also known as “The Epoch”).

    Combining these objects leads to surprising output because c() does not consider the class of both inputs:

    The generic function dispatches based on the class of its first argument. When c.Date() is executed, dttm_ct is converted to a date, but the 3600 seconds are mistaken for 3600 days! When c.POSIXct() is called on date, one day counts as one second only, as illustrated by the following line:

    Some of these problems may be avoided via explicit conversion of the classes:

    Let’s look at unlist(), which operates on list input.

    We see that internally dates(-times) are stored as doubles. Unfortunately this is all we are left with, when unlist strips the attributes of the list.

    To summarise: c() coerces types and errors may occur because of inappropriate method dispatch. unlist() strips attributes.

3.5 Data frames and tibbles

  1. Q: Can you have a data frame with 0 rows? What about 0 columns?

    A: Yes, you can create these data frames easily and in many ways. Even both dimensions can be 0. E.g. you might subset the respective dimension with either 0, NULL or a valid 0-length atomic (logical(0), character(0), integer(0), double(0)). Negative integer sequences would also work. The following example uses a zero:

    Empty data frames can also be created directly (without subsetting):

  2. Q: What happens if you attempt to set rownames that are not unique?

    A Matrices can have duplicated row names, so this does now cause problems

    Data frames, however, required unique rownames and you get different results depending on how you attempt to set them. If you use row.names() directly, you get an error:

    If you use subsetting, [ automatically deduplicates:

  3. Q: If df is a data frame, what can you say about t(df), and t(t(df))? Perform some experiments, making sure to try different column types.

    A Both will return matrices:

    Whose dimensions respect the typical transposition rules:

    Because the output is a matrix, every column is coerced to the same type by as.matrix(), as described below.

  4. Q: What does as.matrix() do when applied to a data frame with columns of different types? How does it differ from data.matrix()?

    A: From ?as.matrix:

    The method for data frames will return a character matrix if there is only atomic columns and any non-(numeric/logical/complex) column, applying as.vector to factors and format to other non-character columns. Otherwise the usual coercion hierarchy (logical < integer < double < complex) will be used, e.g., all-logical data frames will be coerced to a logical matrix, mixed logical-integer will give a integer matrix, etc.

    Let´s transform a dummy data frame into a character matrix. Note that format() is applied to the characters, which gives surprising results: TRUE is transformed to " TRUE" (starting with a space!).

    From ?as.data.matrix:

    Return the matrix obtained by converting all the variables in a data frame to numeric mode and then binding them together as the columns of a matrix. Factors and ordered factors are replaced by their internal codes.

    data.matrix() returns a numeric matrix, where characters are replace by missing values: