Anonymizing IPv6 addresses

Asked 23/5, 2011 at 14:22 Answered 12/10, 2019 at 17:20

As required by law in several countries we anonymize IP-addresses of our users in our log files. Using IPv4 we regularly just anonymize the two last bytes, eg. instead of 255.255.255.255 we log 255.255.\*.\*

What algorithm would you recommend to anonymize IPv6 addresses?

Baten answered 23/5, 2011 at 14:22 Comment(0)

At the very least you want to strip the EUI-64 off, i.e the last 64 bits of the address. more realistically you want to strip quite a lot more to really be private, since the remaining part will still identify only one subnet (i.e. one house possibly)

IPv6 global addressing is very hierarchical, from RFC2374:

 | 3|  13 | 8 |   24   |   16   |          64 bits               |
 +--+-----+---+--------+--------+--------------------------------+
 |FP| TLA |RES|  NLA   |  SLA   |         Interface ID           |
 |  | ID  |   |  ID    |  ID    |                                |
 +--+-----+---+--------+--------+--------------------------------+
 <--Public Topology--->   Site
                       <-------->
                        Topology
                                 <------Interface Identifier----->

The question becomes how private is private enough? Strip 64 bits and you've identified a LAN subnet, not a user. Strip another 16 on top of that and you've identified a small organisation, i.e. a customer of an ISP, e.g. company/branch office with several subnets. Strip the next 24 off an you've basically identified an ISP or really big organisation only.

You can implement this with a bitmask exactly like you would for an IPv4 address, the question becomes a legal one though of "how much do I need to strip to comply with the specific legislation", not a technical one at that point though.

Guienne answered 23/5, 2011 at 14:35 Comment(6)

Thanks, @awoodland, that's the answer I have been hoping for. So I guess a safe approach is stripping the NLA, SLA and Interface IDs, i.e. only keep the first 24 bits. One could even strip the Reserved bits as they are zero anyway (thanks for the link to th RFC) so we'd keep two bytes when using IPv4 as well as when using IPv6. – Baten 23/5, 2011 at 14:54

If you only keep 16 bits of a v6 address what you have is almost useless, for example look at the first 16 bits of addresses of production v6 sites listed in this directory: sixy.ch – Guienne 23/5, 2011 at 14:58

Sounds reasonable. Hm. Maybe a better approach is to keep the first byte of each of the sections. I guess we should discuss internally why we want to keep some of the bits anyway. Thanks for your help! – Baten 23/5, 2011 at 15:11

@tec: Remember, instead of throwing away the data you can always hash it (plus a seed which you throw away after you are done). This prevents being able to find the source but (if done carefully) allows relationships to be preserved (e.g. know that these two addresses came from the same /64, or that these two may have come from the same /48 company, or...). You could hash, for example, the interface id by the public+site+seed bits, and hash the SLA by the public+seed, and hash the NLA by the RES+TLA+FP+seed, etc. Also make sure you cannot deduce the seed with a too-small result space. – Fellowman 24/5, 2011 at 4:6

I "hash" IP addresses by setting last few groups of 16 bits to its remainder after division by 16: ip[3] = ip[3] % 16; ... – Cynosure 7/5, 2015 at 20:44

WRT "how much to strip", Google Analytics anonymizes IP addresses by zeroing the last octet of an IPv4 address and the last 80 bits (SLA ID + Interface ID) of an IPv6 address, per "IP Anonymization (or IP masking) in Analytics" (accessed 2020-11-12). – Neils 12/11, 2020 at 17:58

To anonymize public IPv6 addresses you could take the first 2 groups (32 bits) and replace the remaining part (96 bits) with CRC-16. Some examples (where abc1 and abc2 - are CRC-16 values):

2001:0db8:85a3:0000:0000:8a2e:0370:7334 -> 2001:0db8-abc1
2a02:200:7::123 -> 2a02:200-abc2

Such shortening allows easy matching of the first 2 groups (of course with some probability) with non-anonymized IPv6 in full logs having shorter retention time. Which is good for problem or security incident investigation.

If necessary the CRC-16 could be changed to CRC-12 to increase anonymization level.

Coeducation answered 12/10, 2019 at 17:20 Comment(4)

Nice idea, however is that good enough? You could build a rainbow table for that crc16. – Sacrosanct 14/10, 2019 at 6:24

The CRC-16 is taken for 96 bits of data. So in the rainbow table, one CRC-16 value will point to 2^80 possible IPv6 addresses. Should be enough for anonymizing ;-) – Coeducation 19/10, 2019 at 9:27

CRC not being cryptographically secure means you would still be able to get 16 bits of data easily. If you can guess most of the information in the address you would be able to figure out the rest – Dariodariole 2/11, 2023 at 13:36

Generally, when taking any 16-bit function from 96-bit data we are throwing out at least 80 bits of information. And it's not possible to recover them anyhow from this 16-bit. The function shouldn't be cryptographically secure in our case. – Coeducation 5/11, 2023 at 3:49

Recommended topics

Hot tags