Why is BigDecimal.equals specified to compare both value and scale individually?
Asked Answered
A

7

57

This is not a question about how to compare two BigDecimal objects - I know that you can use compareTo instead of equals to do that, since equals is documented as:

Unlike compareTo, this method considers two BigDecimal objects equal only if they are equal in value and scale (thus 2.0 is not equal to 2.00 when compared by this method).

The question is: why has the equals been specified in this seemingly counter-intuitive manner? That is, why is it important to be able to distinguish between 2.0 and 2.00?

It seems likely that there must be a reason for this, since the Comparable documentation, which specifies the compareTo method, states:

It is strongly recommended (though not required) that natural orderings be consistent with equals

I imagine there must be a good reason for ignoring this recommendation.

Akan answered 31/12, 2012 at 13:16 Comment(1)
It is worth nothing that new BigDecimal("2.0").compareTo(new BigDecimal("2.00")) == 0Dehaven
A
40

Because in some situations, an indication of precision (i.e. the margin of error) may be important.

For example, if you're storing measurements made by two physical sensors, perhaps one is 10x more precise than the other. It may be important to represent this fact.

Anhwei answered 31/12, 2012 at 13:20 Comment(7)
I guess I haven't thought of use of BigDecimal to capture the amount of precision (just as a type which allows arbitrary amounts of precision). Viewed in that way, it makes perfect sense, however then I have to let go of thinking of the object as a numerical type - it does not behave as one as far as equals is concerned.Akan
In my experience the situations in which you want equals() to capture that semantical difference in precision are far rarer than the intuitive case. On top of that, the intuitive case would mean BigDecimal's compareTo() would be consistent with equals(). In my opinion, sun made a mistake here.Elba
@bowmore, that would be my guess too, but experiences vary. Purists could argue they should have provided 2 classes - one class not suitable for sorting (no compareTo) that captures precision as a visible part of the object; and a second class implementing Comparable with compareTo consistent with equals that treats scale & value as a whole. However providing both could seem rather bloated / unpragmatic and create rather than defuse confusion - Sun allowed both functionalities by providing inconsistent compareTo and equals (and surprise many of us along the way).Akan
@Akan an implementation featuring a method like say boolean equalsWithPrecision(BigDecimal other) would have allowed both functionalities, and be consistent.Elba
It also seems to break Set and Map usages.Rambunctious
@GeoffreyDeSmet: Whether such usages are "broken" depends the intended purpose of the set. If one is creating a set for the purpose of allowing references to equivalent-but-distinct instances to be replaced with references to a single instance, the behavior of equals is perfect; I would consider definitions of equals which were inconsistent with usage somewhat dangerous.Semitone
I agree with this idea, but IMHO a class called "Measure" with two numbers: a measured value and an errorbar would have been better, because in most of the cases your instrumental error is not necessarely 1 on some digit.Trover
L
35

The general rule for equals is that two equal values should be substitutable for one another. That is, if performing a computation using one value gives some result, substituting an equals value into the same computation should give a result that equals the first result. This applies to objects that are values, such as String, Integer, BigDecimal, etc.

Now consider BigDecimal values 2.0 and 2.00. We know they are numerically equal, and that compareTo on them returns 0. But equals returns false. Why?

Here's an example where they are not substitutable:

var a = new BigDecimal("2.0");
var b = new BigDecimal("2.00");
var three = new BigDecimal(3);

a.divide(three, RoundingMode.HALF_UP)
==> 0.7

b.divide(three, RoundingMode.HALF_UP)
==> 0.67

The results are clearly unequal, so the value of a is not substitutable for b. Therefore, a.equals(b) should be false.

Lasagne answered 12/2, 2021 at 23:49 Comment(4)
you make it sound sooo easy with this example. awesome!Tidewater
@Tidewater The example was soooo good that we decided to put it into the javadoc: github.com/openjdk/jdk/commit/a1181852 (it should appear in JDK 17 build 13).Lasagne
…and this leads to the conclusion that we should be careful when mixing order and equality, as otherwise, we get bugs like the behavior of Stream.of("0.1", "0.10", "0.1") .map(BigDecimal::new) .sorted().distinct() .forEach(System.out::println);Burp
@Burp Correct. JDK-8223933.Lasagne
S
10

A point which has not yet been considered in any of the other answers is that equals is required to be consistent with hashCode, and the cost of a hashCode implementation which was required to yield the same value for 123.0 as for 123.00 (but still do a reasonable job of distinguishing different values) would be much greater than that of a hashCode implementation which was not required to do so. Under the present semantics, hashCode requires a multiply-by-31 and add for each 32 bits of stored value. If hashCode were required to be consistent among values with different precision, it would either have to compute the normalized form of any value (expensive) or else, at minimum, do something like compute the base-999999999 digital root of the value and multiply that, mod 999999999, based upon the precision. The inner loop of such a method would be:

temp = (temp + (mag[i] & LONG_MASK) * scale_factor[i]) % 999999999;

replacing a multiply-by-31 with a 64-bit modulus operation--much more expensive. If one wants a hash table which regards numerically-equivalent BigDecimal values as equivalent, and most keys which are sought in the table will be found, the efficient way to achieve the desired result would be to use a hash table which stores value wrappers, rather than storing values directly. To find a value in the table, start by looking for the value itself. If none is found, normalize the value and look for that. If nothing is found, create an empty wrapper and store an entry under the original and normalized forms of the number.

Looking for something which isn't in the table and hasn't been searched for previously would require an expensive normalization step, but looking for something that has been searched for would be much faster. By contrast, if HashCode needed to return equivalent values for numbers which, because of differing precision, were stored totally differently, that would make all hash table operations much slower.

Semitone answered 25/7, 2014 at 23:25 Comment(6)
Interesting observation. Correctness trumps performance, so you have to have a short list of what you consider to be the "correct" behaviour of a BigDecimal class (ie should scale/precision be considered for equality) before you start considering performance. We've no idea if this particular argument swung it. Your arguments are equally applicable to equals too, of course.Akan
@bacar: There are two equivalence-related questions which can sensibly be asked of any object (IHMO, the virtual methods of Object should have provided for both): "May X and Y be safely regarded as equivalent, even if references are freely shared with outside code", and "May X and Y be safely regarded as equivalent by their owner, if it maintains exclusive control over X, Y, and all constituent mutable state?" I would suggest that the only types which should define equals in a fashion which doesn't match either of the above would be those whose instances are not expect to be...Semitone
...exposed to the outside world. For example, if one needs to use a hashed set of strings which are compared in case-insensitive fashion, one could define a CaseInsensitiveStringWrapper type whose equals and hashCode operate on uppercase versions of the wrapped string. Although the wrapper would have an "unusual" meaning for equals, it would not be exposed to outside code. Since BigDecimal is intended for use by outside code, it should only report instances as equal if all reasonable outside code would consider them equivalent.Semitone
@bacar: Personally, I think the situation with the equals and compareTo methods of BigDecimal is great: code which wants things to be compared based upon value can use compareTo, and code which wants to compare based upon equivalence can use equals. Note that precision doesn't just affect output; I believe at least one way of performing division uses precision of the dividend to control the precision to which the result is rounded, such that 10.0/3 would 3.3, while 10.000/3 would yield 3.333. Substituting 10.0 for 10.000 would thus not be safe.Semitone
Division may have been specified to behave differently, if equality had been specified differently. I think your CaseInsensitiveStringWrapperraises a very interesting point though - it is easy to implement a 'fuzzier' equivalence on top of a stricter one, whereas it may be harder, impossible or simply surprising to implement a strict one in terms of a fuzzier one. Either way, the principle of least surprise if violated for one set of users or another.Akan
@bacar: I would suggest that if users are taught that they should expect to use methods other than equals when they want to test loose equality, then nobody need be surprised.Semitone
M
6

In math, 10.0 equals 10.00. In physics 10.0m and 10.00m are arguably different (different precision), when talking about objects in an OOP, I would definitely say that they are not equal.

It's also easy to think of unexpected functionality if equals ignored the scale (For instance: if a.equals(b), wouldn't you expect a.add(0.1).equals(b.add(0.1)?).

Microwatt answered 31/12, 2012 at 13:21 Comment(2)
Yes, I would expect that, but I don't understand your point; I'm not suggesting it ignore the scale; I'm suggesting it consider the value and the scale as a whole, as compareTo does.Akan
OK. I understand that sometimes users may want to consider precision, but I still don't get what your point is about unexpected functionality. If they'd chosen to let 2.0 equals 2.00, I'm not sure where your example of adding 0.1 causes problems.Akan
C
5

If numbers get rounded, it shows the precision of the calculation - in other words:

  • 10.0 could mean that the exact number was between 9.95 and 10.05
  • 10.00 could mean that the exact number was between 9.995 and 10.005

In other words, it is linked to arithmetic precision.

Coralyn answered 31/12, 2012 at 13:22 Comment(0)
S
2

The compareTo method knows that trailing zeros do not affect the numeric value represented by a BigDecimal, which is the only aspect compareTo cares about. By contrast, the equals method generally has no way of knowing what aspects of an object someone cares about, and should thus only return true if two objects are equivalent in every way that a programmer might be interested in. If x.equals(y) is true, it would be rather surprising for x.toString().equals(y.toString()) to yield false.

Another issue which is perhaps even more significant is that BigDecimal essentially combines a BigInteger and a scaling factor, such that if two numbers represent the same value but have different numbers of trailing zeroes, one will hold a bigInteger whose value is some power of ten times the other. If equality requires that the mantissa and scale both match, then the hashCode() for BigDecimal can use the hash code of BigInteger. If it's possible for two values to be considered "equal" even though they contain different BigInteger values, however, that will complicate things significantly. A BigDecimal type which used its own backing storage, rather than a BigInteger, could be implemented in a variety of ways to allow numbers to be quickly hashed in such a way that values representing the same number would compare equal (as a simple example, a version which packed nine decimal digits in each long value and always required that the decimal point sit between groups of nine, could compute the hash code in a way that would ignore trailing groups whose value was zero) but a BigDecimal that encapsulates a BigInteger can't do that.

Semitone answered 21/1, 2013 at 20:20 Comment(8)
"the equals method generally has no way of knowing what aspects of an object someone cares about" - I vehemently disagree with this statement. Classes define (sometimes implicitly) a contract for their externally visible behaviour, which includes equals. Classes often exist specifically to hide (by encapsulation) details that users do not care about.Akan
Also - I don't think that in general you should have an expectation that equals be consistent with toString. Classes are at liberty to define toString pretty much however they see fit. Consider an example from the JDK, Set<String> s1 = new LinkedHashSet<String>(); s1.add("foo"); s1.add("bar"); Set<String> s2 = new LinkedHashSet<String>(); s2.add("bar"); s2.add("foo"); s1 and s2 have different string representations but compare equal.Akan
@bacar: Perhaps I'm over-extending .Net principles to Java. The hashed collections in .Net allow one to specify methods for equality comparison and hashing, thus effectively telling the collection what aspects of the object it should be interested in. If one had a collection type that maintained its elements in sequence, but offered SequenceEquals GetSequenceHashCode, ContentEquals, and GetContentHashCode methods, one could then store such a type into a hashed collection using reference equality, sequence equality, or order-independent content equality.Semitone
I disagree with this statement, too. I've found, in my own experience when overriding the equals() method in custom objects, it's better to define equivalence on a small scale (aka, as few object attributes as possible) rather than on a big scale. The fewer attributes that contribute to equivalence, the better. Databases work in this same principle.Electuary
@ryvantage: One wouldn't generally expect to use objects with many fields as dictionary keys for purposes of looking up "other" information, but especially when dealing with hierarchical collections there may be a number of circumstances where one ends up with many copies of the same information; if one can efficiently identify references to distinct but equivalent objects, replacing references to all but the oldest copy with references to the oldest copy may save memory and improve performance; to do that, one must compare all fields.Semitone
Well, for me, I use Objects in my applications that are modeled exactly like they are on the database, using HashSet to store a lot of them, and using methods like add() and contains(), it looks for equivalence, so, at first, when I overrode equals() it compared every field of the object, but if for some reason a new element got added that was a little different, the HashSet would retain them both, which was no bueno. I ended up defining equality (and hashvalue) based exclusively on the id (primary key) from the database.Electuary
So, in my sense, if two objects have the same id, they represent the same instance of the object, even if their fields aren't equal. This was the only way to get them to behave in any kind Set I used.Electuary
Are the objects mutable or immutable, and what is their relation to any persistent store? If the objects are tied to rows in a database, I would suggest that multiple distinct objects attached to the same row shouldn't exist in the first place. Otherwise, I'm not quite clear why you're using a Set rather than Map? I would think that natural way to store things would be as a `Map, whose "key" object encapsulates those parts of the data which are relevant to equality.Semitone
F
2

I imagine there must be a good reason for ignoring this recommendation.

Maybe not. I propose the simple explanation that the designers of BigDecimal just made a bad design choice.

  1. A good design optimises for the common use case. The majority of the time (>95%), people want to compare two quantities based on mathematical equality. For the minority of the time where you really do care about the two numbers being equal in both scale and value, there could have been an additional method for that purpose.
  2. It goes against people's expectations, and creates a trap that's very easy to fall into. A good API obeys the "principle of least surprise".
  3. It breaks the usual Java convention that Comparable is consistent with equality.

Interestingly, Scala's BigDecimal class (which is implemented using Java's BigDecimal under the hood) has made the opposite choice:

BigDecimal("2.0") == BigDecimal("2.00")     // true
Fogbound answered 11/7, 2014 at 8:24 Comment(4)
A fundamental requirement of equals is that two objects with unequal hash codes must compare unequal, and the design of BigDecimal is such that numbers with different precision are stored very differently. Thus, having equals regard values with different precision as equivalent would greatly impair the performance of hash tables, even those in which all values were stored with equivalent precision.Semitone
@Semitone Good observation. However, I'd argue that BigDecimal-keyed Maps (and Sets) are so rare a use-case that it's not sufficient justification for a scale-sensitive equals.Fogbound
Use of such types as map keys may not be terribly common, but it's probably not terribly rare either. Among other things, code which ends up computing similar values frequently may sometimes benefit enormously from caching frequently-computed values. For that to work efficiently, it's imperative that the hash function be good and fast.Semitone
@Semitone 1) It's safe to say BigDecimal keys are much rarer than people getting bitten by its unintuitive definition of equality; 2) if a scale-insensitive hash is a performance bottleneck, you're likely in a setting where using BigDecimal itself is too slow (e.g. you might switch to longs for monetary calculations).Fogbound

© 2022 - 2024 — McMap. All rights reserved.