With reference to: Is rename() atomic?
I'm asking something similar, but not quite the same, because what I want to know is is it safe to rely on the atomicty of rename()
when using NFS?
Here's a scenario I'm dealing with - I have an 'index' file that must always be present.
So:
- Client creates a new file
- Client renames new file over 'old' index file.
Separate client:
- Reads index file
- refers to on disk structure based on the index.
This is making the assumption that rename()
being atomic means - there will always be an 'index' file (although, it might be an out of date version, because caching and timing)
However the problem I'm hitting is this - that this is happening on NFS - and working - but several of my NFS clients are occasionally reporting "ENOENT" - no such file or directory. (e.g. in hundreds operations happening at 5m intervals, we get this error every couple of days).
So what I'm hoping is whether someone can enlighten me - should it actually be impossible to get 'ENOENT' in this scenario?
The reason I'm asking is this entry in RFC 3530:
The RENAME operation must be atomic to the client.
I'm wondering if that means just the client issuing the rename, and not the client viewing the directory? (I'm ok with a cached/out of date directory structure, but the point of this operation is that this file will always be 'present' in some form)
Sequence of operations (from the client performing the write operation) is:
21401 14:58:11 open("fleeg.ext", O_RDWR|O_CREAT|O_EXCL, 0666) = -1 EEXIST (File exists) <0.000443>
21401 14:58:11 open("fleeg.ext", O_RDWR) = 3 <0.000547>
21401 14:58:11 fstat(3, {st_mode=S_IFREG|0600, st_size=572, ...}) = 0 <0.000012>
21401 14:58:11 fadvise64(3, 0, 572, POSIX_FADV_RANDOM) = 0 <0.000008>
21401 14:58:11 fcntl(3, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=1, len=1}) = 0 <0.001994>
21401 14:58:11 open("fleeg.ext.i", O_RDWR|O_CREAT, 0666) = 4 <0.000538>
21401 14:58:11 fstat(4, {st_mode=S_IFREG|0600, st_size=42, ...}) = 0 <0.000008>
21401 14:58:11 fadvise64(4, 0, 42, POSIX_FADV_RANDOM) = 0 <0.000006>
21401 14:58:11 close(4) = 0 <0.000011>
21401 14:58:11 fstat(3, {st_mode=S_IFREG|0600, st_size=572, ...}) = 0 <0.000007>
21401 14:58:11 open("fleeg.ext.i", O_RDONLY) = 4 <0.000577>
21401 14:58:11 fstat(4, {st_mode=S_IFREG|0600, st_size=42, ...}) = 0 <0.000007>
21401 14:58:11 fadvise64(4, 0, 42, POSIX_FADV_RANDOM) = 0 <0.000006>
21401 14:58:11 fstat(4, {st_mode=S_IFREG|0600, st_size=42, ...}) = 0 <0.000007>
21401 14:58:11 fstat(4, {st_mode=S_IFREG|0600, st_size=42, ...}) = 0 <0.000007>
21401 14:58:11 read(4, "\3PAX\1\0\0O}\270\370\206\20\225\24\22\t\2\0\203RD\0\0\0\0\17\r\0\2\0\n"..., 42) = 42 <0.000552>
21401 14:58:11 close(4) = 0 <0.000013>
21401 14:58:11 fcntl(3, F_SETLKW, {type=F_RDLCK, whence=SEEK_SET, start=466, len=68}) = 0 <0.001418>
21401 14:58:11 pread(3, "\21@\203\244I\240\333\272\252d\316\261\3770\361#\222\200\313\224&J\253\5\354\217-\256LA\345\253"..., 38, 534) = 38 <0.000010>
21401 14:58:11 pread(3, "\21@\203\244I\240\333\272\252d\316\261\3770\361#\222\200\313\224&J\253\5\354\217-\256LA\345\253"..., 38, 534) = 38 <0.000010>
21401 14:58:11 pread(3, "\21\"\30\361\241\223\271\256\317\302\363\262F\276]\260\241-x\227b\377\205\356\252\236\211\37\17.\216\364"..., 68, 466) = 68 <0.000010>
21401 14:58:11 pread(3, "\21\302d\344\327O\207C]M\10xxM\377\2340\0319\206k\201N\372\332\265R\242\313S\24H"..., 62, 300) = 62 <0.000011>
21401 14:58:11 pread(3, "\21\362cv'\37\204]\377q\362N\302/\212\255\255\370\200\236\350\2237>7i`\346\271Cy\370"..., 104, 362) = 104 <0.000010>
21401 14:58:11 pwrite(3, "\21\302\3174\252\273.\17\v\247\313\324\267C\222P\303\n~\341F\24oh/\300a\315\n\321\31\256"..., 127, 572) = 127 <0.000012>
21401 14:58:11 pwrite(3, "\21\212Q\325\371\223\235\256\245\247\\WT$\4\227\375[\\\3263\222\0305\0\34\2049A;2U"..., 68, 699) = 68 <0.000009>
21401 14:58:11 pwrite(3, "\21\262\20Kc(!.\350\367i\253hkl~\254\335H\250.d\0036\r\342\v\242\7\255\214\31"..., 38, 767) = 38 <0.000009>
21401 14:58:11 fsync(3) = 0 <0.001007>
21401 14:58:11 fstat(3, {st_mode=S_IFREG|0600, st_size=805, ...}) = 0 <0.000009>
21401 14:58:11 open("fleeg.ext.i.tmp", O_RDWR|O_CREAT|O_TRUNC, 0666) = 4 <0.001813>
21401 14:58:11 fstat(4, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0 <0.000007>
21401 14:58:11 fadvise64(4, 0, 0, POSIX_FADV_RANDOM) = 0 <0.000007>
21401 14:58:11 write(4, "\3PAX\1\0\0qT2\225\226\20\225\24\22\t\2\0\205;D\0\0\0\0\17\r\0\2\0\n"..., 42) = 42 <0.000012>
21401 14:58:11 stat("fleeg.ext.i", {st_mode=S_IFREG|0600, st_size=42, ...}) = 0 <0.000011>
21401 14:58:11 fchmod(4, 0100600) = 0 <0.002517>
21401 14:58:11 fstat(4, {st_mode=S_IFREG|0600, st_size=42, ...}) = 0 <0.000008>
21401 14:58:11 close(4) = 0 <0.000011>
21401 14:58:11 rename("fleeg.ext.i.tmp", "fleeg.pax.i") = 0 <0.001201>
21401 14:58:11 close(3) = 0 <0.000795>
21401 14:58:11 munmap(0x7f1475cce000, 4198400) = 0 <0.000177>
21401 14:58:11 munmap(0x7f14760cf000, 4198400) = 0 <0.000173>
21401 14:58:11 futex(0x7f147cbcb908, FUTEX_WAKE_PRIVATE, 2147483647) = 0 <0.000010>
21401 14:58:11 exit_group(0) = ?
21401 14:58:11 +++ exited with 0 +++
NB - Paths and files renamed in the above for consistency. fleeg.ext
is the data file, and fleeg.ext.i
is the index. During this process - the fleeg.ext.i
file is being overwritten (by the .tmp
file), which is why the belief is that there should always be a file at that path (either the old one, or the new that's just overwritten it).
On the reading client the PCAP looks like LOOKUP
NFS call is what's failing:
124 1.375777 10.10.41.35 -> 10.10.41.9 NFS 226 LOOKUP fleeg.ext.i V3 LOOKUP Call, DH: 0x6fbbff3a/fleeg.ext.i
125 1.375951 10.10.41.9 -> 10.10.41.35 NFS 186 5347 LOOKUP 0775 Directory V3 LOOKUP Reply (Call In 124) Error: NFS3ERR_NOENT
126 1.375975 10.10.41.35 -> 10.10.41.9 NFS 226 LOOKUP fleeg.ext.i V3 LOOKUP Call, DH: 0x6fbbff3a/fleeg.ext.i
127 1.376142 10.10.41.9 -> 10.10.41.35 NFS 186 5347 LOOKUP 0775 Directory V3 LOOKUP Reply (Call In 126) Error: NFS3ERR_NOENT
strace
it appears probably not - the 'new' file is created by theopen
call, and thenrename
ed whilst open. I'll run another trace to confirm. – MonodicENOENT
, is itopen
orread
? – Queenqueenaopen
- although I'm less certain, as I don't have an strace of a failure (it doesn't happen frequently enough for me to have caught it), just application layer logging. – Monodic