Am I crazy? Switching an established product from HSQLDB to Apache Derby
Asked Answered
R

8

25

I have an established software product that uses HSQLDB as its internal settings database. Customer projects are stored in this database. Over the years, HSQLDB has served us reasonably well, but it has some stability/corruption issues that we've had to code circles around, and even then, we can't seem to protect ourselves from them completely.

I'm considering changing internal databases. Doing this would be fairly painful from a development perspective, but corrupted databases (and lost data) are not fun to explain to customers.

So my question is: Does anyone have enough experience to weigh in on the long-term stability of Apache Derby? I found a post via Google complaining that Derby was unstable, but it was from 2006 so I'd entertain the idea that it has been improved in the last 4 years. Or, is there another pure Java embedded (in-process) database that I could use (commercial or open-source). Performance isn't very important to me. Stability is king. Data integrity across power loss, good BLOB support, and hot-backups are all a must.

Please don't suggest something that isn't a SQL-based relational database. I'm trying to retrofit an existing product, not start from scratch, thanks.

Ribbon answered 29/6, 2010 at 20:53 Comment(1)
"Am I crazy" is a very essential question nowadays. :)Baillargeon
U
24

For each database engine there is a certain risk of corruption. I am the main author of the H2 database, and I also got reports about broken databases. Testing can reduce the probability of bugs, but unfortunately it's almost impossible to guarantee some software is 'bug free'.

As for the three Java database HSQLDB, Apache Derby, and H2, I can't really say which one is the most stable. I can only speak about H2. I think for most operations, H2 is now stable. There are many test cases that specially test for databases getting corrupt. This includes automated tests on power loss (using a christmas light timer). With power failure tests I found out stability also depends on the file system: sometimes I got 'CRC error' messages meaning the operating system can't read the file (it was Windows). In that case, there is not much you can do.

For mission critical data, in any case I wouldn't rely on the software being stable. It's very important to create backups regularly, and test them. Some databases have multiple way to create backups. H2 for example has an online backup feature, and a feature to write a SQL script file. An alternative is to use replication or clustering. H2 supports a simple cluster mode, I believe Derby supports replication.

Uzziel answered 3/7, 2010 at 17:14 Comment(1)
I tried the in-memory mode for all the three database(they all support it) but H2 is really faster than hsqldb, and hsqldb faster than derby. But if h2 can support some kind of mode, I can use both memory table and some kind of cached table in hsqldb in a mixed(hydrid) mode, I will choose h2. Otherwise, perhaps hsqldb,h2 both are okay. Because it's really an attractive feature when data becomes large +1Landre
S
16

I ran Derby 24/7 as the internal database supporting a build automation and test management system for 4 years. It was used by a worldwide team, and never crashed, lost data, or corrupted my records. The only reason we stopped using it is because our company was bought by another and a higher-level decision was handed down. Derby is solid, reliable, and well worth your consideration.

Shifflett answered 29/6, 2010 at 22:5 Comment(1)
Based on my experience also, Apache Derby is fine for production use. I am running Apache Derby 24/7 for the past 4 years in production; no crashes, no lost data, no corruptions. The database for user accounts, for example, now contains over 1 Million records and is about 4GB in size. No performance issues.Sadomasochism
W
10

This search shows 215 posts in HSQLDB Users mailing list containing the string "corrupt". http://search.gmane.org/?query=corrupt&author=&group=gmane.comp.java.hsqldb.user&sort=date&DEFAULTOP=and&xP=Zcorrupt&xFILTERS=Gcomp.java.hsqldb.user---A

This search shows 264 posts in Derby Users mailing list containing the same string. http://search.gmane.org/?query=corrupt&author=&group=gmane.comp.apache.db.derby.user&sort=date&DEFAULTOP=and&xP=Zcorrupt&xFILTERS=Gcomp.apache.db.derby.user---A

This one shows 1003 posts in Derby Dev mailing list with the same string http://search.gmane.org/?query=corrupt&author=&group=gmane.comp.apache.db.derby.devel&sort=date&DEFAULTOP=and&xP=Zcorrupt&xFILTERS=Gcomp.apache.db.derby.devel---A

A look at some of the posts shows possible or real cases of database corruption happen despite all the best efforts of database developers.

HSQLDB has had its own share of database corruption issues but has improved over the years. In the latest versions precautions and fixes have been introduced to prevent all the issues that were reported in the last few years.

The new lob storage feature however, turned out to have a logic bug that results in the lobs being "forgotten" after an update. This is being fixed right now, with more extensive tests to support the fix.

Users like CarlG have helped a lot over the years in the bug fixing efforts of both Derby and HSQLDB.

Fred Toussi, HSQLDB Project

Whiten answered 3/7, 2010 at 11:48 Comment(5)
Thanks for the perspective, Fred, very illuminating.Ribbon
I'm an indipendent developer. Please mind the ratio: what is the total of the messages? I did follow the links and gathered the message count/on/total. At the time of writing in hsqldb.user there are 254/on/6150 (4.1%) messages talking about 'corrupt', in db.derby.user 341/on/14790 (2,3%), in db.derby.devel 1805/on/102163 (1,7%)Reannareap
@Reannareap db.derby.devel contains a huge amount of auto-generated commit and bug tracker etc. messages. Ratios do not mean much in such contexts.Whiten
@Whiten I disagree: for the same reason the word 'corrupt' is repeated for each message/commit that contains that very word in the issue description. Your current answer, lead me to think that derby has more corruption issue reports than hsqldb; this is the opposite that emerges from my analysis.Reannareap
@Reannareap I am saying ratios do not make sense. The point of the answer is there are comparable numbers of posts about corruption.Whiten
Y
9

Does anyone have enough experience to weigh in on the long-term stability of Apache Derby? (...)

Derby, ex IBM Cloudscape (and now also distributed by Sun as JavaDB) is an ACID-compliant database that can stand a lot of concurrent users, running embedded or in server mode, and is know to be robust and production ready. It is not as fast as HSQLDB (Derby uses durable operations), but it's robust. Still, you should run your own tests against it.

See also

Yukikoyukio answered 29/6, 2010 at 22:19 Comment(1)
The blog has moved to blogs.oracle.com/FrancoisOrsini now that Oracle owns Sun.Wesleywesleyan
C
7

I have been using Apache Derby since 2009 in many of my projects, some of them with 24/7 operation and many millions of rows.

Never ever had a single event of data corruption. Rock solid and fast.

I keep choosing it as my RDBMS of choice, unless a good reason not to pops out.

Consistency answered 29/6, 2010 at 20:53 Comment(0)
N
4

With regard to HSQLDB, one thing that it doesn't have as a project that SQLite has is the documentation of a robust testing suite and online documentation of assiduous ACID compliance.

I don't mean to take anything away from HSQLDB. It's meant to serve as an alternative to MySQL not to fopen() as SQLite is intended. One can say that the scope of HSQLDB (all the Java RDBMS's really) is much more ambiitious. Fredt and his group have accomplished an extraordinary achievement with HSQLDB. Even so, doing the Google search "Is HSQLDB ACID compliant" doesn't leave an early adopter feeling as confident as one feels after reading about the testing harnesses on the SQLite website.

At http://sqlite.org/transactional.html

"SQLite is Transactional

A transactional database is one in which all changes and queries appear to be Atomic, Consistent, Isolated, and Durable (ACID). SQLite implements serializable transactions that are atomic, consistent, isolated, and durable, even if the transaction is interrupted by a program crash, an operating system crash, or a power failure to the computer.

We here restate and amplify the previous sentence for emphasis: All changes within a single transaction in SQLite either occur completely or not at all, even if the act of writing the change out to the disk is interrupted by

  • a program crash,
  • an operating system crash, or
  • a power failure.

The claim of the previous paragraph is extensively checked in the SQLite regression test suite using a special test harness that simulates the effects on a database file of operating system crashes and power failures."

At http://sqlite.org/testing.html

"1.0 Introduction

The reliability and robustness of SQLite is achieved in part by thorough and careful testing.

As of version 3.7.14, the SQLite library consists of approximately 81.3 KSLOC of C code. (KSLOC means thousands of "Source Lines Of Code" or, in other words, lines of code excluding blank lines and comments.) By comparison, the project has 1124 times as much test code and test scripts - 91421.1 KSLOC.

1.1 Executive Summary

Three independently developed test harnesses 100% branch test coverage in an as-deployed configuration Millions and millions of test cases Out-of-memory tests I/O error tests Crash and power loss tests Fuzz tests Boundary value tests Disabled optimization tests Regression tests Malformed database tests Extensive use of assert() and run-time checks Valgrind analysis Signed-integer overflow checks"

Nap answered 29/6, 2010 at 20:53 Comment(0)
T
4

Try looking into H2. It was created by the guy who originally made HSQLDB but built from scratch so doesn't use any HSQLDB code. Not sure how its stability compares to HSQL since I haven't used HSQL in ages and I'm only using H2 for short-lived databases currently. I personally found H2 to be easier to get going than Derby but maybe that's because H2 has a cheat sheet web page.

It might be possible to re-code to use an abstraction layer and then run tests to compare H2 and Derby with the issues you have found.

On the project management side of the fence, does your roadmap have a major version coming up? That might be a rather appropriate time to rip out the guts this way and I wouldn't say you were crazy cause it could potentially remove lots of hard to manage work arounds. If you wanted to make the change where it could affect live systems without plenty of warning and backups in place then you may be crazy.

Thielen answered 29/6, 2010 at 21:35 Comment(0)
M
1

Give SQLite a try if you're looking for something self contained (no server involved). This is what backs android's db api, and is highly stable.

Monetmoneta answered 29/6, 2010 at 20:53 Comment(2)
I'd love to, but it isn't Java. Anyone had any experience with the Java wrappers for SQLite they'd care to share? See this for background: https://mcmap.net/q/98231/-java-and-sqlite-closedRibbon
Just for anyone that want to use sqlite with java: this works now.Thermal

© 2022 - 2024 — McMap. All rights reserved.