22 Data structures

22.1 Vectors

  1. Q: What are the six types of atomic vector? How does a list differ from an atomic vector?
    A: The six types are logical, integer, double, character, complex and raw. The elements of a list don’t have to be of the same type.

  2. Q: What makes is.vector() and is.numeric() fundamentally different to is.list() and is.character()?
    A: The first two tests don’t check for a specific type.

  3. Q: Test your knowledge of vector coercion rules by predicting the output of the following uses of c():

  4. Q: Why do you need to use unlist() to convert a list to an atomic vector? Why doesn’t as.vector() work?
    A: To get rid of (flatten) the nested structure.

  5. Q: Why is 1 == "1" true? Why is -1 < FALSE true? Why is "one" < 2 false?
    A: These operators are all functions which coerce their arguments (in these cases) to character, double and character. To enlighten the latter case: “one” comes after “2” in ASCII.

  6. Q: Why is the default missing value, NA, a logical vector? What’s special about logical vectors? (Hint: think about c(FALSE, NA_character_).)
    A: It is a practical thought. When you combine NAs in c() with other atomic types they will be coerced like TRUE and FALSE to integer (NA_integer_), double (NA_real_), complex (NA_complex_) and character (NA_character_). Recall that in R there is a hierarchy of recursion that goes logical -> integer -> double -> character. If NA were, for example, a character, including NA in a set of integers or logicals would result in them getting coerced to characters which would be undesirable. Making NA a logical means that involving an NA in a dataset (which happens often) will not result in coercion.

22.2 Attributes

  1. Q: An early draft used this code to illustrate structure():

    But when you print that object you don’t see the comment attribute. Why? Is the attribute missing, or is there something else special about it? (Hint: try using help.)

    A: From the help of comment (?comment):

    Contrary to other attributes, the comment is not printed (by print or print.default).

    Also from the help of attributes (?attributes):

    Note that some attributes (namely class, comment, dim, dimnames, names, row.names and tsp) are treated specially and have restrictions on the values which can be set.

  2. Q: What happens to a factor when you modify its levels?

    A: Both, the entries of the factor and also its levels are being reversed:

  3. Q: What does this code do? How do f2 and f3 differ from f1?

    A: Unlike f1 f2 and f3 change only one thing. They change the order of the factor or its levels, but not both at the same time.

22.3 Matrices and arrays

  1. Q: What does dim() return when applied to a vector?
    A: NULL

  2. Q: If is.matrix(x) is TRUE, what will is.array(x) return?
    A: TRUE, as also documented in ?array:

    A two-dimensional array is the same thing as a matrix.

  3. Q: How would you describe the following three objects? What makes them different to 1:5?

    A: They are of class array and so they have a dim attribute.

22.4 Data frames

  1. Q: What attributes does a data frame possess?
    A: names, row.names and class.

  2. Q: What does as.matrix() do when applied to a data frame with columns of different types?
    A: From ?as.matrix:

    The method for data frames will return a character matrix if there is only atomic columns and any non-(numeric/logical/complex) column, applying as.vector to factors and format to other non-character columns. Otherwise the usual coercion hierarchy (logical < integer < double < complex) will be used, e.g., all-logical data frames will be coerced to a logical matrix, mixed logical-integer will give a integer matrix, etc.

  3. Q: Can you have a data frame with 0 rows? What about 0 columns?
    A: Yes, you can create them easily. Also both dimensions can be 0: