Friday 9 June 2017

R formatting conventions

[I changed my conventions; disregard this, future Russ]

When it comes to naming things, we've all got our own preferences. Here, boringly, are my preferences for naming things in R (it's partly based on Google and Wickham ; but I also found this interesting). You can call it a style guide, if you like, but I've very little by way of style. To me it's just a bunch of conventions that I try to stick to.

I use a different naming format for functions (underscore-separated) than for non-functions (dot-separated). I really don't care that dot-separation would confuse java or python, because these are my conventions for R. 

An exception to the function_name underscore-convention is when I'm making functions that are of a particular type: I put the type as a dot-separated prefix, eg, if I'm writing a function that returns a function, I call it 'builder.more_specific_name', it it's a function that reads or writes to files I call it 'io.what_does_it_do'  etc.

I try to push volatile side-effecting or state-dependent shizz (IO, changes to options, printing, plotting, setting/using RNG streams; but not stop/message/warning and the like) out to the sides. All variables used by a function are passed in as arguments or defined therein, so that modulo stop (etc), most functions are pure.

Use at most 80 characters per line

Where-ever possible, use <- instead of =.
woop <- TRUE  # purple means good
woops = FALSE  # red means bad

Never use <<- 
... and never recommend a course that teaches how to use <<- to anyone
... and never, ever, mention <<- on your blog.
Similarly: attach, ->

Try not to overwrite the names of existing objects (you can check using exists("possible.name")):

j5.df <- data.frame(abc = 123)
df <- data.frame(abc = 1:3)   # see stats::df
keep <- which(j5.df$abc == 1) # keep is not currently defined
drop <- which(df$abc == 2)    # but drop is: see base::drop

Use 2-spaces to indent:

my_func <- function(arg1){
  # some code
  }

If there's more than one argument to a function you are defining, put each arg on a separate line and offset these by an extra 2 spaces:

my_fancy_function <- function(
    arg1,
    another.arg = NULL
  ){
  if(missing(arg1)){
    stop("ARGGGGG!")
    }
  body("goes", "here", arg1, another.arg)
  }

Use the magrittr pipe to connect several functions together (and write the functions on separate lines):
my.results <- abc.def %>%
  function1(some.other.arg) %>%
  some_other_fn

All the rest, rather briefly:

job.specific.package.name

ClassName ## and also ClassName == ConstructorName

object.name

function_name

<function_type>.function_name

<function_type>.function_name.<name_of_pipeline>

.non_exported_function

pipeline.name_of_pipeline

Re the last thing, my pipelines aren't just strings of functions, they're strings of functions that log intermediate results (summary tables, summary statistics, plots etc; this is all done functionally, so you can't log base-R graphics or any other side-effect dependent stuff) and return both a final-processed dataset and the specified intermediate results. I'm sure I'll write about that some other time (when it's less hacked together).

Neat, related tools: formatR, (thanks to Lovelace), lintr, ...

I noted that formatR is able to identify and fix some examples of poor coding style in R, however, it didn't seem particularly pretty to me: 
- There was no way to specify that line widths should be at most 80 characters (it's line.width thing splits lines after at least a given width); 
- It wrapped function calls / function definitions into as dense a space as possible

lintr is also able to identify examples of poor coding style in R. Some of my choices don't fit it's default checkers though - notably dot-separation in variable/function names, but I still like to have two forms of punctuation available. I'm going to write up a specific lintr script to call from my git pre-commit hook.

Both are available via conda.

No comments:

Post a Comment