= sum(dat[dat[,1]>5 & dat[,2]<10,3])
a = mean(dat[dat[,1]>5 & dat[,2]<10,3])
b print(paste('Sum =',a,', Mean =',b))
8 Best practices
8.1 Code readability
Code readability refers to how easily a programmer can understand and interpret a piece of code. It is an essential aspect of software development because readable code is easier to maintain, faster to debug, more collaborative, and less prone to errors. Code readability can be broadly categorized into two main aspects: code style and code transparency. Code style pertains primarily to the format and presentation of the code. Code transparency delves deeper into the logic and structure of the code. It refers to how straightforwardly the functionality, logic, or operations are conveyed.
8.1.1 Example
Consider the following two examples:
Poor code readability:
Good code readability:
# Library for data manipulation
library(dplyr)
# Analysis parameters
<- 5
MIN_AGE <- 10
MAX_SCORE
# Extract summary performance metrics for subset of observations
<- dat |>
summary_data filter(age >= MIN_AGE & score <= MAX_SCORE) |>
summarize(
sum_performance = sum(performance_metric),
mean_performance = mean(performance_metric)
)
# Print the computed results
cat(sprintf("Sum of Performance: %f | Mean of Performance: %f",
$sum_performance,
summary_data$mean_performance)) summary_data
In this section, we will discuss how to write code like that in the second snippet above.
8.1.2 Code style
Code should adhere to the tidyverse style guide, which includes guidance on the following aspects of code:
- Naming conventions
- Spacing and indentation
- Commenting
You can use the styler
package to automatically conform your spacing and indentation to the tidyverse style guide.
8.2 Code transparency
To write transparent code, follow these guidelines:
Use tidyverse paradigms as much as possible (e.g.
dplyr
summaries instead of apply operations)Use names, rather than indices, for subsetting (e.g.
results["mse", "lasso"]
versusresults[2,4]
)Use named arguments in function calls, especially with more than one argument (e.g.
rbinom(3, 1, 0.5)
versusrbinom(n = 3, size = 1, prob = 0.5)
)Put logically related chunks of code together into code blocks, with a comment describing the thrust of that code block.
Name constants in your scripts. “Magic numbers” are unexplained numbers in your scripts:
# Bad practice: Magic number if (x > 30) ... # Good practice: Using a named constant <- 30 MAX_AGE if (x > MAX_AGE) ...
Descriptive constant names provide clarity. Furthermore, especially if these constants are used in multiple places throughout your script, updating them becomes as simple as changing one line of code. It is also advisable to put all such constants together, near the top of the script.
8.3 Modularity
Avoid repetitive code. Repetition not only lengthens your script but also increases the chance for mistakes.
# Bad practice: Repetitive code
$age[data$age < 0] <- NA
data$score[data$score < 0] <- NA
data
# Good practice: Create a function
<- function(variable) {
replace_negatives_with_NA < 0] <- NA
variable[variable return(variable)
}
$age <- replace_negatives_with_NA(data$age)
data$score <- replace_negatives_with_NA(data$score) data
8.4 Code speed
When optimizing the execution speed of your code, it’s essential to strike a balance between code readability and efficiency. However, as Donald Knuth famously stated, ‘premature optimization is the root of all evil.’ Focus on writing clean and functional code first. Once your code works correctly, you can then consider optimizing the most computationally intensive parts if necessary.
8.4.1 Vectorization
R is a vectorized language, which means that operations can be performed on entire vectors rather than looping over individual elements. For example, instead of using a loop to square each element of a vector, you can simply square the vector directly:
<- c(1, 2, 3, 4, 5)
numbers
# Non-vectorized operation using a loop
<- vector("numeric", length(numbers))
squared_numbers for (i in seq_along(numbers)) {
<- numbers[i]^2
squared_numbers[i]
}
# Example of vectorized operation
<- numbers^2 squared_numbers
8.4.2 Factoring code out of loops
Often, parts of the code inside a loop don’t depend on the loop variable and can be taken outside the loop, leading to efficiency gains. For example, if you’re repeatedly computing something within a loop that doesn’t change, compute it once outside the loop.
# Inefficient loop: Compute mean of the entire dataset in each iteration
for (i in 1:n_bootstrap) {
<- sample(data_points, size = length(data_points), replace = TRUE)
sample <- mean(data_points) # This is unnecessary in the loop
mu_data <- mean(sample) - mu_data
bootstrap_means[i]
}
# Optimized loop: Factor out the mean computation of the dataset
<- numeric(n_bootstrap)
bootstrap_means_optimized <- mean(data_points)
mu_data for (i in 1:n_bootstrap) {
<- sample(data_points, size = length(data_points), replace = TRUE)
sample <- mean(sample) - mu_data
bootstrap_means[i] }