I have a binary file containing numeric values coded as signed or unsigned integers of different lengths (mostly 2-/4-byte). To process this data I read the desired section of the file as raw
vector with readBin()
and then try to convert it to decimal. The issue is, that R
's built-in functions have restrictions, I do not fully understand (such as no long unsigned ints
) - please see the example below.
How to read custom-length unsigned int
s from raw data? Is there a more appropriate and elegant approach, than specified below?
require(dplyr)
###############################################################################
# create examplary raw vector of 24 bytes
set.seed(1)
raw <- sample(0:0xff, 24, T) %>% as.raw %>% print
###############################################################################
# approach with readBin() - not working
# read 2-byte unsigned integers left-to-right, not an issue
readBin(raw, size = 2, n = length(raw) / 2, integer(), endian = 'big', signed = FALSE)
# read 4-byte signed integers left-to-right, it's ok
readBin(raw, size = 4, n = length(raw) / 4, integer(), endian = 'big', signed = TRUE)
# first issue: readBin can't read-in 4-byte unsigned integers
readBin(raw, size = 4, n = length(raw) / 4, integer(), endian = 'big', signed = FALSE)
# second issue: readBin can't read-in custom-size integers
readBin(raw[1:3], size = 3, n = length(raw) / 3, integer(), endian = 'big')
###############################################################################
# approach with rawToBits() and packBits() - does not work either
# packBits() also treats an integer as signed
raw[1:2] %>% rawToBits %>% packBits('integer')
# and expects a length of 32 bits
raw[1:2] %>% rawToBits %>% packBits('integer')
###############################################################################
# manual approach - working
# please note this requires reversing order of raw vector,
# as rawToBits() places the most significant bit to the right
# this approach correctly converts the 32-bit unsigned int to decimal
# but would be difficult to vectorize for multiple ints
# (I guess summing must be done in loops)
raw[4:1] %>% rawToBits %>% as.logical %>% which %>% {2^(. - 1)} %>% sum