Benefits of using integer values for constants rather than numeric values (e.g. 1L vs 1) in R
Asked Answered
S

1

17

In R source code, most (but not all) functions use integer values for constants:

colnames <- function(x, do.NULL = TRUE, prefix = "col")
{
    if(is.data.frame(x) && do.NULL)
    return(names(x))
    dn <- dimnames(x)
    if(!is.null(dn[[2L]]))
    dn[[2L]]
    else {
        nc <- NCOL(x)
    if(do.NULL) NULL
        else if(nc > 0L) paste0(prefix, seq_len(nc))
        else character()
    }
}

R Language Definition says:

In most cases, the difference between an integer and a numeric value will be unimportant as R will do the right thing when using the numbers. There are, however, times when we would like to explicitly create an integer value for a constant.

  • What are these cases where there is a need to force integer values for constants instead of simply using numeric values? Examples where e.g. 1 would fail but e.g. 1L would not are welcome.
  • Conversely, in which cases using integer values is not necessary (e.g. interactive use vs programming, indexing with constants, etc.)?

The question is about good practice and the rationale, not about e.g. the "L" notation itself, the difference between integer class and numeric class, or comparing numbers.

Spurn answered 24/9, 2021 at 18:19 Comment(5)
Thank you for the links. However, it does not answer my question. I know the difference between integer class and numeric class. The question is about "good practice" and rationale of when forcing integer constants instead of simply using numeric constants. I edited the question to clarify.Spurn
It'd be difficult to anticipate every potential edge case where a floating point error could arise, so it can be simpler from a programming perspective to use integers explicitly when the elements in question are necessarily integers (e.g. with indexes or column numbers). Being explicit in the function that a value in that context should always be an integer can help "fail early" before it could potentially cause a harder-to-understand problem downstream, which is a design principle for "defensive programming." bitsandbugs.io/2018/07/27/defensive-programming-in-rMaudemaudie
Thank you for your comment and the link. It may be useful to post your comment as an answer. I suspected that was the reason. Does it means this is only significant in programming and not in interactive use? In addition, I am curious if this is mostly theoretical or if there are cases where this is important in practice. Examples are welcome!Spurn
@Spurn For interests sake it might be worth you looking into dynamic typing and static typing, while not strictly an R example this may be useful: docs.oracle.com/cd/E57471_01/bigData.100/extensions_bdd/src/…. In general you should try to make explicitly define / coerce an object to be a certain type prior to returning it. This may not only reduce the risk of errors, as is pointed out by Jon Spring, but may lead to performance improvements. For instance an integer vector/scalar occupies less memory than a float or double vector/scalar.Null
Related: Does the documentation or language definition for R remark on the intended usage of R's integers?Luzluzader
D
14

These are some of the use cases in which I explicitly use the L suffix in declaring the constants. Of course these are not strictly "canonical" (or the only ones), but maybe you can have an idea of the rationale behind. I added, for each case, a "necessary" flag; you will see that these arise only if you interface other languages (like C).

  • Logical type conversion (not necessary)

Instead of using a classic as.integer, I use adding 0L to a logical vector to make it integer. Of course you could just use 0, but this would require more memory (typically 8 bytes instead of four) and a conversion.

  • Manipulating the result of a function that returns integer (not necessary)

Say for instance that you want to find to retrieve the elements of the vector after a NA. You could:

which(is.na(vec)) + 1L

Since which returns an integer, adding 1L will preserve the type and avoid an implicit conversion. Nothing will happen if you omit the L, since it's just a small optimization. This happens also with match for instance: if you want to post-process the result of such a function, it's good habit to preserve the type if possible.

  • Interfacing C (necessary)

From ?integer:

Integer vectors exist so that data can be passed to C or Fortran code which expects them, and so that (small) integer data can be represented exactly and compactly.

C is much stricter regarding data types. This implies that, if you pass a vector to a C function, you can not rely on C to do the conversions. Say that you want to replace the elements after a NA with some value, say 42. You find the positions of the NA values at the R level (as we did before with which) and then pass the original vector and the vector of indices to C. The C function will look like:

SEXP replaceAfterNA (SEXP X, SEXP IND) {
   ...
   int *ind = INTEGER(IND);
   ...
   for (i=0; i<l; i++) {
        //make here the replacement
   }
}

and the from the R side:

...
ind <- which(is.na(x)) + 1L
.Call("replaceAfterNA", x, ind)
...

If you omit the L in the first line of above, you will receive an error like:

INTEGER() cannot be applied to double vectors

since C is expecting an integer type.

  • Interfacing Java (necessary)

Same as before. If you use the rJava package and want R to call your own custom Java classes and methods, you have to be sure that an integer is passed when the Java method requires an integer. Not adding a specific example here, but it should be clear why you may want to use the L suffix in constants in these cases.

Addendum

The previous cases where about when you may want to use L. Even if I guess much less common, it might be useful to add a case in which you don't want the L. This may arise if there is danger of integer overflow. The *, + and - operators preserve the type if both the operand are integer. For example:

#this overflows
31381938L*3231L
#[1] NA
#Warning message:
#In 31381938L * 3231L : NAs produced by integer overflow

#this not
31381938L*3231
#[1] 1.01395e+11

So, if you are doing operations on an integer variable which might produce overflow, it's important to cast it to double to avoid any risk. Adding/subtracting to that variable a constant without the L might be a good occasion as any to make the cast.

Daron answered 27/9, 2021 at 8:41 Comment(6)
Thank you for this informative answer! I already upvoted your answer, but I am waiting until the deadline to validate it and award the bounty to keep the question attractive. For the first use case, I prefer exlicit coercion via as.integer rather than + 0L, but this is worth noting. Your second use case has direct implication for my use! (Not the third and forth one, but this is worth noting too!)Spurn
I made an addendum to include also a case in which is not correct to use the L suffix.Daron
which can return a double.Steve
For reference, from which documentation: "If arr.ind == FALSE (the default), an integer vector, or a double vector if x is a long vector."Spurn
And from match documentation, the fonction returns: "An integer vector giving the position in table of the first match if there is a match, otherwise nomatch." (...) "Note that it is coerced to integer."Spurn
@Daron You may wish to add some relevant text from ?integer: Details: Integer vectors exist so that data can be passed to C or Fortran code which expects them, and so that (small) integer data can be represented exactly and compactly.Luzluzader

© 2022 - 2024 — McMap. All rights reserved.