BitSet memory usage in Scala
Asked Answered
A

1

5

I would like to know what is the memory usage of BitSet in Scala.For example, if I do:

  var bitArray:BitSet=new BitSet(10)
  bitArray.add(0)
  bitArray.add(2)
  bitArray.add(4)
  bitArray.add(6)
  bitArray.add(8)

How does that compare with and array containing the even numbers 0,2,4,6,8?

What about writing a number in binary:

  var bitArray:BitSet=new BitSet(32)
  bitArray.add(5)
  bitArray.add(3)
  bitArray.add(2)
  bitArray.add(1)
  bitArray.add(0)

How does that compare to the number 47?

I'm asking here of memory usage. But as a more open question, if you know, what are the advantages/disadvantages or uses of BitSet (WR to other common data types).

Thanks,

Amphibiotic answered 29/6, 2010 at 12:57 Comment(5)
possible duplicate of Boolean[] vs. BitSet: Which is more efficient?Intricate
Perhaps you should give us a higher-level statement of the problem you're trying to solve instead of three variant questions about very low-level data structure properties.Ibbie
Thanks Thomas, that post gave me more insight about BitSet. I still want to know if I can gain space by representing other structures by a BitSet. I guess everithing will be clearer if someone can clarify how BitSet is implemented. Thanks,Amphibiotic
Thomas: The context here is Scala, which provides its own BitSet, though an Array[Boolean] is the same thing as Boolean[] in Java. Closely related, but not exact duplicate.Cuffs
@Daniel It's to similar to be interesting. If it would support compressed bitsets ... but both implementations have the same idea you can read the Java or Scala lib codes it is the same.Intricate
P
16

You can look at the implementation of BitSet in Scala 2.8 here: scala.collection.mutable.BitSet.

It is implemented based on an array of Longs. The size of the array depends only on the highest number stored in it. Divide the highest number stored in it by 64, rounding up, and you have the size of the array. Each element in the array consumes 8 bytes.

That means that dividing by 8 the greatest number stored in it, roughly yields the size in bytes of the BitSet. "Roughly" because of virtual machine memory management overheads, because the pointer to the array also needs some memory and because the array itself has some overhead.

The order of insertion or the actual number of elements stored in the BitSet have no influence on the size of the memory allocated.

For the two examples you give, only one Long-element is required to store the numbers, using 8 bytes of memory, as in both examples the highest number is less than 64.

An array of Ints, storing any five numbers, would consume 5 * 4 bytes = 20 bytes plus overhead. For storing n numbers you need roughly n * 4 bytes.

So you are comparing (highestNumberStored / 8) bytes against (countOfNumbersStored * 4) bytes.

Pullman answered 29/6, 2010 at 16:4 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.