Best datatype to store hexidecimal and hex characters in the database
Asked Answered
S

2

9

I'm using the Ethereum api. I want to store the information from the api into a mysql table.

The address data looks like:

0x3f5ce5fbfe3e9af3971dd833d26ba9b5c936f0be
0x1d80982502f3bb75654df13aa32bbd5ac9cab7d6
0xaf13bbdbe1ff53c2df7109a53c217320d2d76ee2
...

I've been using just varchar column for these characters. Are there a better data type? I'm thinking maybe varbinary but I don't know if there's any advantages. The disadvantage is that the sql code will be messier as I will have to HEX() and UNHEX().

Scratchy answered 7/11, 2017 at 19:42 Comment(0)
F
8

For me no specific type exists, you need to choose between text and binary :

CHAR(40) : (charset is not important here)

  • pros : simplicity
  • cons : you need more space disk (~+100%)
  • cons : you risk to store invalid non hexa data

BINARY(20) :

  • pros : reduce space disk
  • pros : you can't store invalid data
  • cons: you need to convert (if you need see hexa value)

For me data consistency is the most important point : I prefer binary option.

  • unhex('FF') == unhex('ff') but 'FF' <> 'ff'
  • trying to store unhex('zz') raises an error, with text no error raised)

Moreover Hex/unHex are very simple function

Hint : You can store your data in BINARY column and create an hexa view to see easily hexa value.

Here is an example to store data in binary zone http://rextester.com/SEV11235

Faitour answered 7/11, 2017 at 20:3 Comment(7)
thanks. do you think it's matter what encoding/collation I should use to save space?Scratchy
Hexa contains only [0-9][a-f] characters : encoding/collation is not important (ie UTF8 use several bytes to specific non ASCII characters "é € etc...")Faitour
So I should not use UTF8 and maybe use latin1?Scratchy
UTF8 or Latin1 will produce same data with Hexa character (UTF8 was designed for backward compatibility with ASCII <=127 .)Faitour
Use UTF8, it always safe.Faitour
en.wikipedia.org/wiki/UTF-8 : It was designed for backward compatibility with ASCII. Code points with lower numerical values, which tend to occur more frequently, are encoded using fewer bytes.Faitour
Do not use utf8 or latin1 encodings. You should be decoding the hex value, not treating the address like free text. Your address will be exactly 20 bytes long if you do it correctly. In python 3.5, for example, it would look like bytes.fromhex('3f5ce5fbfe3e9af3971dd833d26ba9b5c936f0be').Amblyoscope
D
2

Advantage CHAR: simplicity.

Advantage BINARY: you save 22 bytes per row.

In my opinion, the complexity increase isn't worth the space saving unless you're looking at a minimum of million-row databases. Even then I probably wouldn't bother.

In either case, if you care that much you can save a byte or two (and possibly improve your DB structure optimization) by using a fixed-length column rather than variable.

Downpour answered 7/11, 2017 at 19:51 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.