3.1 Atomic vectors
Q: How do you create scalars of type raw and complex? (See
A: In R scalars are represented as vectors of length one. For raw and complex types these can be created via
Raw vectors can easily be created from numeric or character values.
For complex numbers real and imaginary parts may be provided directly.
Q: Test your knowledge of vector coercion rules by predicting the output of the following uses of
Q: Why is
1 == "1"true? Why is
-1 < FALSEtrue? Why is
"one" < 2false?
A: These comparisons are carried out by operator-functions, which coerce their arguments to a common type. In the examples above these cases will be character, double and character:
1will be coerced to
FALSEis represented as
"2"(and numerals precede letters in the lexicographic order (may depend on locale)).
Q: Why is the default missing value,
NA, a logical vector? What’s special about logical vectors? (Hint: think about
A: The presence of missing values shouldn´t affect the type of an object. Recall that there is a type-hierarchy for coercion from character >> double >> integer >> logical. When combining
NAs with other atomic types, the
NAs will be coerced to integer (
NA_integer_), double (
NA_real_) or character (
NA_character_) and not the other way round. If
NAwas a character and added to a set of other values all of these would be coerced to character as well.
Q: Precisely what doA: The documentation states that:
is.atomic()tests if an object has one of these types:
is.numeric()tests if an object has integer or double type and is not of
is.vector()tests if an object has no attributes, except of names and if its
mode()is atomic (
Q: How is
setNames()implemented? How is
unname()implemented? Read the source code.
setNames()is implemented as:
Because the data argument comes first
setNames()also works well with the magrittr-pipe operator. When no first argument is given, the result is a named vector:
setNames()only affects the names-attribute and ignores other more specific name-related attributes such as dimnames (for matrices and arrays).
unname()is implemented in the following way:
unname()removes existing names- and dimnames-attributes. By default the dimnames attribute (names and row names) won’t be affected for data frames.
Q: What does
dim()return when applied to a 1d vector? When might you use
NULLwhen applied to a 1d vector.
One may want to use
NCOL()to handle atomic vectors, lists and NULL values similar to one column matrices or data frames. For these objects
NULL. This may occur in interactive data analysis, while subsetting data frames.
Q: How would you describe the following three objects? What makes them different to
A: These objects have the class array instead of vector. Their dimensions are stored in the
Q: An early draft used this code to illustrate
But when you print that object you don’t see the comment attribute. Why? Is the attribute missing, or is there something else special about it? (Hint: try using help.)
A: The documentation states (see
Contrary to other attributes, the comment is not printed (by print or print.default).
Note that some attributes (namely class, comment, dim, dimnames, names, row.names and tsp) are treated specially and have restrictions on the values which can be set.
We can retrieve comment attributes by calling them explicitly:
3.3 S3 atomic vectors
Q: What sort of object does
table()return? What is its type? What attributes does it have? How does the dimensionality change as you tabulate more variables?
table()returns a contingency table of its input variables, which has the class
"table". Internally it is represented as an array (implicit class) of integers (type) with the attributes
dim(dimension of the underlying array) and
dimnames(one name for each input column). The dimensions correspond to the number of unique values (factor levels) in each input variable.
Q: What happens to a factor when you modify its levels?
A: Both the elements of the factor and as well as its levels are being reversed:
Q: What does this code do? How do
f3either the order of the factor elements or its levels are being reversed. For
f1both transformations are occurring.
Q: List all the ways that a list differs from an atomic vector.A: To summarise:
- Atomic vectors are always homogeneous (all elements must be of the same type). Lists may be heterogeneous (the elements can be of different types).
- Atomic vectors point to one address in memory, while lists contain a separate references for each element.
- Subsetting with out of bound values or
NAs leads to
NAs for atomics and
NULLvalues for lists.
Q: Why do you need to use
unlist()to convert a list to an atomic vector? Why doesn’t
A: A list is also a vector, though not an atomic one!
Q: Compare and contrast
unlist()when combining a date and date-time into a single vector.
A: Date and date-time objects are build upon doubles. Dates are represented as days, while date-time-objects (POSIXct) represent seconds (counted in respect to the reference date 1970-01-01, also known as “The Epoch”).
When combining these objects method-dispatch leads to surprising output:
The generic function dispatches based on the class of its first argument. When
dttm_ctis converted to a date, but the 3600 seconds are mistaken for 3600 days! When
c.POSIXct()is called on
date, one day counts as one second only, as illustrated by the following line:
Some of these problems may be avoided via explicit conversion of the classes:
Let’s look at
unlist(), which operates on list input.
We see that internally dates(-times) are stored as doubles. Unfortunately this is all we are left with, when unlist strips the attributes of the list.
c()coerces types and errors may occur because of inappropriate method dispatch.
3.5 Data frames and tibbles
Q: Can you have a data frame with 0 rows? What about 0 columns?
A: Yes, you can create these data frames easily and in many ways. Even both dimensions can be 0. E.g. you might subset the respective dimension with either
NULLor a valid 0-length atomic (
double(0)). Negative integer sequences would also work. The following example uses the recycling rules for logical subsetting:
Empty data frames can also be created directly (without subsetting):
Q: What happens if you attempt to set rownames that are not unique?
A For matrices this will work without any problems. For data frames it is not possible and what happens depends on the approach. When using the
row.names<-replacement function, no further arguments can be set and the underlying
.rowNamesDF<-will throw an error (and an additional warning):
However, by calling
.rowNamesDF<-directly one can set the
TRUE. When set to
NA, any non unique row name will trigger the new row names to become
make.names = TRUE, row names will automatically converted into unique ones via
make.names(value, unique = TRUE). The same behaviour is caused, when a matrix with non unique row names is converted into a data frame.
dfis a data frame, what can you say about
t(t(df))? Perform some experiments, making sure to try different column types.
A Both will return matrices with dimensions regarding the typical transposition rules. As
as.matrix.data.frame()for the preprocessing in front of applying
t.default()and elements of matrices need to be of the same type, all elements will be coerced in the usual order (logical << integer << double << character). Factors, dates and datetimes are treated as characters during coercion.
Q: What does
as.matrix()do when applied to a data frame with columns of different types? How does it differ from
The method for data frames will return a character matrix if there is only atomic columns and any non-(numeric/logical/complex) column, applying as.vector to factors and format to other non-character columns. Otherwise the usual coercion hierarchy (logical < integer < double < complex) will be used, e.g., all-logical data frames will be coerced to a logical matrix, mixed logical-integer will give a integer matrix, etc.
Let´s transform a dummy data frame into a character matrix. Note that
format()is applied to the characters, which may complicate conversion back to the previous type. (For example
TRUEis transformed to
" TRUE"(starting with a space))
df_coltypes <- data.frame(a = c("a", "b"), b = c(TRUE, FALSE), c = c("TRUE", "FALSE"), d = c(1L, 0L), e = c(1.5, 2), f = c("one" = 1, "two" = 2), g = factor(c("f1", "f2")), stringsAsFactors = FALSE) as.matrix(df_coltypes) #> a b c d e f g #> one "a" " TRUE" "TRUE" "1" "1.5" "1" "f1" #> two "b" "FALSE" "FALSE" "0" "2.0" "2" "f2"
Return the matrix obtained by converting all the variables in a data frame to numeric mode and then binding them together as the columns of a matrix. Factors and ordered factors are replaced by their internal codes.
data.matrix()returns a numeric matrix, where characters are replace by missing values: