When it comes to naming things, we've all got our own preferences. Here, boringly, are my preferences for naming things in R (it's partly based on Google and Wickham ; but I also found this interesting). You can call it a style guide, if you like, but I've very little by way of style. To me it's just a bunch of conventions that I try to stick to.
I use a different naming format for functions (underscore-separated) than for non-functions (dot-separated). I really don't care that dot-separation would confuse java or python, because these are my conventions for R.
An exception to the function_name underscore-convention is when I'm making functions that are of a particular type: I put the type as a dot-separated prefix, eg, if I'm writing a function that returns a function, I call it 'builder.more_specific_name', it it's a function that reads or writes to files I call it 'io.what_does_it_do' etc.
I try to push volatile side-effecting or state-dependent shizz (IO, changes to options, printing, plotting, setting/using RNG streams; but not stop/message/warning and the like) out to the sides. All variables used by a function are passed in as arguments or defined therein, so that modulo stop (etc), most functions are pure.
Use at most 80 characters per line
Where-ever possible, use <- instead of =.
woop <- TRUE # purple means good
woops = FALSE # red means bad
Never use <<-
... and never recommend a course that teaches how to use <<- to anyone
... and never, ever, mention <<- on your blog.
Similarly: attach, ->
Try not to overwrite the names of existing objects (you can check using exists("possible.name")):
j5.df <- data.frame(abc = 123)
df <- data.frame(abc = 1:3) # see stats::df
keep <- which(j5.df$abc == 1) # keep is not currently defined
drop <- which(df$abc == 2) # but drop is: see base::drop
Use 2-spaces to indent:
my_func <- function(arg1){
# some code
}
If there's more than one argument to a function you are defining, put each arg on a separate line and offset these by an extra 2 spaces:
my_fancy_function <- function(
arg1,
another.arg = NULL
){
if(missing(arg1)){
stop("ARGGGGG!")
}
body("goes", "here", arg1, another.arg)
}
Use the magrittr pipe to connect several functions together (and write the functions on separate lines):
my.results <- abc.def %>%
function1(some.other.arg) %>%
some_other_fn
All the rest, rather briefly:
job.specific.package.name
ClassName ## and also ClassName == ConstructorName
object.name
function_name
<function_type>.function_name
<function_type>.function_name.<name_of_pipeline>
.non_exported_function
pipeline.name_of_pipeline
Re the last thing, my pipelines aren't just strings of functions, they're strings of functions that log intermediate results (summary tables, summary statistics, plots etc; this is all done functionally, so you can't log base-R graphics or any other side-effect dependent stuff) and return both a final-processed dataset and the specified intermediate results. I'm sure I'll write about that some other time (when it's less hacked together).
I noted that formatR is able to identify and fix some examples of poor coding style in R, however, it didn't seem particularly pretty to me:
- There was no way to specify that line widths should be at most 80 characters (it's line.width thing splits lines after at least a given width);
- It wrapped function calls / function definitions into as dense a space as possible
lintr is also able to identify examples of poor coding style in R. Some of my choices don't fit it's default checkers though - notably dot-separation in variable/function names, but I still like to have two forms of punctuation available. I'm going to write up a specific lintr script to call from my git pre-commit hook.
Both are available via conda.
No comments:
Post a Comment