watchman makes fsnotify spuriously detect file changes
Asked Answered
E

3

0

I'm using watchman for Git's core.fsmonitor setting. I'm running a different tool that's using fsnotify to detect file changes & run builds. Something watchman is doing is making fsnotify think that files are changing when they're not (the fsnotify tool is running builds constantly). How can I discover what exactly is happening, so I can adapt the tool to ignore those changes?

Expressage answered 19/9, 2019 at 7:53 Comment(0)
W
1

Each time a watchman query is executed it is subject to query synchronization; this is necessary in order to be sure that all file changes prior to the start of the query have been read from the kernel change notification queue.

Watchman will write a cookie file with a random name and wait to observe this file in the kernel notification stream in order to perform that synchronization.

This page has more details on synchronization: https://facebook.github.io/watchman/docs/cookies.html

It sounds to me like the fsnotify component in your integration would benefit from adding a simple filter; for example, only match files with extensions that look like plausible source files before initiating a build.

Wooley answered 20/9, 2019 at 17:25 Comment(1)
Thank you, that does help. Whitelisting files would be too complex (the repo in question is enormous and diverse), but maybe there's a pattern I can use to blacklist the cookie file?Expressage
T
1

The watchman integration for fsmonitor was racy, which has been corrected to be more conservative, with Git 2.25 (Q1 2020).

That might help with your case.

See commit dd0b61f (04 Nov 2019) by Kevin Willford (kewillford).
(Merged by Junio C Hamano -- gitster -- in commit fc7b26c, 01 Dec 2019)

fsmonitor: fix watchman integration

Helped-by: Derrick Stolee
Signed-off-by: Kevin Willford

When running Git commands quickly -- such as in a shell script or the test suite -- the Git commands frequently complete and start again during the same second.

The example fsmonitor hooks to integrate with Watchman truncate the nanosecond times to seconds. In principle, this is fine, as Watchman claims to use inclusive comparisons. The result should only be an over-representation of the changed paths since the last Git command.

However, Watchman's own documentation claims "Using a timestamp is prone to race conditions in understanding the complete state of the file tree".
All of their documented examples use a "clockspec" that looks like 'c:123:234'.
Git should eventually learn how to store this type of string to provide a stronger integration, but that will be a more invasive change.

When using GIT_TEST_FSMONITOR="$(pwd)/t7519/fsmonitor-watchman", scripts such as t7519-wtstatus.sh fail due to these race conditions.
In fact, running any test script with GIT_TEST_FSMONITOR pointing at t/t7519/fsmonitor-wathcman will cause failures in the test_commit function.
The 'git add "$indir$file"' command fails due to not enough time between the creation of '$file' and the 'git add' command.

For now, subtract one second from the timestamp we pass to Watchman.
This will make our window large enough to avoid these race conditions. Increasing the window causes tests like t7519-wtstatus.sh to pass.

When the integration was introduced in def437671 ("fsmonitor: add a sample integration script for Watchman", 2018-09-22, Git v2.16.0-rc0 -- merge listed in batch #5), the query included an expression that would ignore files created and deleted in that window.
The performance reason for this change was to ignore temporary files created by a build between Git commands.
However, this causes failures in script scenarios where Git is creating or deleting files quickly.

When using GIT_TEST_FSMONITOR as before, t2203-add-intent.sh fails due to this add-and-delete race condition.

By removing the "expression" from the Watchman query, we remove this race condition. It will lead to some performance degradation in the case of users creating and deleting temporary files inside their working directory between Git commands. However, that is a cost we need to pay to be correct.


With Git 2.26 (Q1 2020), a new version of fsmonitor-watchman hook has been introduced, to avoid races.

See commit dfaed02, commit e4e1e83 (23 Jan 2020), and commit 8da2c57, commit 56c6910 (07 Jan 2020) by Kevin Willford (kewillford).
(Merged by Junio C Hamano -- gitster -- in commit c9a33e5, 14 Feb 2020)

fsmonitor: change last update timestamp on the index_state to opaque token

Signed-off-by: Kevin Willford

Some file system monitors might not use or take a timestamp for processing and in the case of watchman could have race conditions with using a timestamp. Watchman uses something called a clockid that is used for race free queries to it. The clockid for watchman is simply a string.

Change the fsmonitor_last_update from being a uint64_t to a char pointer so that any arbitrary data can be stored in it and passed back to the fsmonitor.

And:

fsmonitor: handle version 2 of the hooks that will use opaque token

Signed-off-by: Kevin Willford

Some file monitors like watchman will use something other than a timestamp to keep better track of what changes happen in between calls to query the fsmonitor. The clockid in watchman is a string. Now that the index is storing an opaque token for the last update the code needs to be updated to pass that opaque token to a verion 2 of the fsmonitor hook.

Because there are repos that already have version 1 of the hook and we want them to continue to work when git is updated, we need to handle both version 1 and version 2 of the hook. In order to do that a config value is being added core.fsmonitorHookVersion to force what version of the hook should be used.

When this is not set it will default to -1 and then the code will attempt to call version 2 of the hook first. If that fails it will fallback to trying version 1.

The new config setting is:

core.fsmonitorHookVersion:

Sets the version of hook that is to be used when calling fsmonitor.
There are currently versions 1 and 2.
When this is not set, version 2 will be tried first and if it fails then version 1 will be tried.

  • Version 1 uses a timestamp as input to determine which files have changes since that time but some monitors like watchman have race conditions when used with a timestamp.
  • Version 2 uses an opaque string so that the monitor can return something that can be used to determine what files have changed without race conditions.

With Git 2.30 (Q1 2021), the index format documentation is updated regarding the file system monitor extension:

See commit 5885367 (14 Dec 2020) by Jeff Hostetler (Jeff-Hostetler).
(Merged by Junio C Hamano -- gitster -- in commit 7bceb83, 17 Dec 2020)

index-format.txt: document v2 format of file system monitor extension

Signed-off-by: Jeff Hostetler

Update the documentation of the file system monitor extension to describe version 2.

The format was extended to support opaque tokens in: 56c6910028 fsmonitor: change last update timestamp on the index_state to opaque token

technical/index-format now includes in its man page:

  • 32-bit version number: the current supported versions are 1 and 2.
  • (Version 1) 64-bit time: the extension data reflects all changes through the given

technical/index-format now includes in its man page:

  • (Version 2) A null terminated string: an opaque token defined by the file system monitor application.
    The extension data reflects all changes relative to that token.

Before Git 2.31.1 (Q1 2021), the data structure used by fsmonitor interface was not properly duplicated during an in-core merge, leading to use-after-free etc.

See commit 4abc578, commit 3dfd305 (17 Mar 2021) by Johannes Schindelin (dscho).
(Merged by Junio C Hamano -- gitster -- in commit 1dd4e74, 19 Mar 2021)

fsmonitor: fix memory corruption in some corner cases

Signed-off-by: Johannes Schindelin

In 56c6910 ("fsmonitor: change last update timestamp on the index_state to opaque token", 2020-01-07, Git v2.26.0-rc0 -- merge listed in batch #5), we forgot to adjust the part of unpack_trees() that copies the FSMonitor "last-update" information that we copy from the source index to the result index since 679f2f9 ("unpack-trees: skip stat on fsmonitor-valid files", 2019-11-20, Git v2.25.0-rc0 -- merge listed in batch #3).

Since the "last-update" information is no longer a 64-bit number, but a free-form string that has been allocated, we need to duplicate it rather than just copying it.

This is important because there are cases when unpack_trees() will perform a oneway merge that implicitly calls refresh_fsmonitor() (which will allocate that "last-update" token).
This happens after that token was copied into the result index.
However, we then call check_updates() on that index, which will also call refresh_fsmonitor(), accessing the "last-update" string, which by now would be released already.

In the instance that lead to this patch, this caused a segmentation fault during a lengthy, complicated rebase involving the todo command reset that (crucially) had to updated many files.
Unfortunately, it seems very hard to trigger that crash, therefore this patch is not accompanied by a regression test.

Turdine answered 8/12, 2019 at 0:15 Comment(0)
T
0

The fsmonitor mentioned here is based on an Inter-process communication called Simple-IPC.

Git 2.32 (Q2 2021) details its introduction, which will then be used to build services like fsmonitor on top.

See commit 36a7eb6, commit 7cd5dbc (22 Mar 2021), and commit 9fd1902, commit 77e522c, commit 55144cc, commit 4f98ce5, commit 59c7b88, commit 066d523, commit 7455e05 (15 Mar 2021) by Jeff Hostetler (Jeff-Hostetler).
See commit 8c2efa5, commit c4ba579, commit 3a63c6a (15 Mar 2021) by Johannes Schindelin (dscho).
(Merged by Junio C Hamano -- gitster -- in commit 861794b, 02 Apr 2021)

simple-ipc: add Unix domain socket implementation

Signed-off-by: Jeff Hostetler

Create Unix domain socket based implementation of "simple-ipc".

A set of ipc_client routines implement a client library to connect to an ipc_server over a Unix domain socket, send a simple request, and receive a single response.
Clients use blocking IO on the socket.

A set of ipc_server routines implement a thread pool to listen for and concurrently service client connections.

The server creates a new Unix domain socket at a known location.
If a socket already exists with that name, the server tries to determine if another server is already listening on the socket or if the socket is dead.
If socket is busy, the server exits with an error rather than stealing the socket.
If the socket is dead, the server creates a new one and starts up.

If while running, the server detects that its socket has been stolen by another server, it automatically exits.

simple-ipc: design documentation for new IPC mechanism

Signed-off-by: Johannes Schindelin
Signed-off-by: Jeff Hostetler

Brief design documentation for new IPC mechanism allowing foreground Git client to talk with an existing daemon process at a known location using a named pipe or unix domain socket.

technical/api-simple-ipc now includes in its man page:

Simple-IPC API

The Simple-IPC API is a collection of ipc_ prefixed library routines and a basic communication protocol that allow an IPC-client process to send an application-specific IPC-request message to an IPC-server process and receive an application-specific IPC-response message.

Communication occurs over a named pipe on Windows and a Unix domain socket on other platforms.

IPC-clients and IPC-servers rendezvous at a previously agreed-to application-specific pathname (which is outside the scope of this design) that is local to the computer system.

The IPC-server routines within the server application process create a thread pool to listen for connections and receive request messages from multiple concurrent IPC-clients.
When received, these messages are dispatched up to the server application callbacks for handling. IPC-server routines then incrementally relay responses back to the IPC-client.

The IPC-client routines within a client application process connect to the IPC-server and send a request message and wait for a response. When received, the response is returned back the caller.

For example, the fsmonitor--daemon feature will be built as a server application on top of the IPC-server library routines. It will have threads watching for file system events and a thread pool waiting for client connections.
Clients, such as git status will request a list of file system events since a point in time and the server will respond with a list of changed files and directories.
The formats of the request and response are application-specific; the IPC-client and IPC-server routines treat them as opaque byte streams.

Comparison with sub-process model

The Simple-IPC mechanism differs from the existing sub-process.c model (Documentation/technical/long-running-process-protocol.txt) and used by applications like Git-LFS. In the LFS-style sub-process model the helper is started by the foreground process, communication happens via a pair of file descriptors bound to the stdin/stdout of the sub-process, the sub-process only serves the current foreground process, and the sub-process exits when the foreground process terminates.

In the Simple-IPC model the server is a very long-running service. It can service many clients at the same time and has a private socket or named pipe connection to each active client. It might be started (on-demand) by the current client process or it might have been started by a previous client or by the OS at boot time. The server process is not associated with a terminal and it persists after clients terminate. Clients do not have access to the stdin/stdout of the server process and therefore must communicate over sockets or named pipes.

Server startup and shutdown

How an application server based upon IPC-server is started is also outside the scope of the Simple-IPC design and is a property of the application using it. For example, the server might be started or restarted during routine maintenance operations, or it might be started as a system service during the system boot-up sequence, or it might be started on-demand by a foreground Git command when needed.

Similarly, server shutdown is a property of the application using the simple-ipc routines. For example, the server might decide to shutdown when idle or only upon explicit request.

Simple-IPC protocol

The Simple-IPC protocol consists of a single request message from the client and an optional response message from the server. Both the client and server messages are unlimited in length and are terminated with a flush packet.

The pkt-line routines (Documentation/technical/protocol-common.txt) are used to simplify buffer management during message generation, transmission, and reception. A flush packet is used to mark the end of the message. This allows the sender to incrementally generate and transmit the message. It allows the receiver to incrementally receive the message in chunks and to know when they have received the entire message.

The actual byte format of the client request and server response messages are application specific. The IPC layer transmits and receives them as opaque byte buffers without any concern for the content within. It is the job of the calling application layer to understand the contents of the request and response messages.

Summary

Conceptually, the Simple-IPC protocol is similar to an HTTP REST request. Clients connect, make an application-specific and stateless request, receive an application-specific response, and disconnect. It is a one round trip facility for querying the server. The Simple-IPC routines hide the socket, named pipe, and thread pool details and allow the application layer to focus on the application at hand.


Warning: the "simple-ipc" did not compile without pthreads support, but the build procedure was not properly account for it: that has been fixed with Git 2.32 (Q2 2021).

See commit 6aac70a (20 May 2021) by Jeff Hostetler (Jeff-Hostetler).
(Merged by Junio C Hamano -- gitster -- in commit 6aae0e2, 22 May 2021)

simple-ipc: correct ifdefs when NO_PTHREADS is defined

Reported-by: Randall S. Becker
Helped-by: Jeff King
Signed-off-by: Jeff Hostetler

Simple IPC always requires threads (in addition to various platform-specific IPC support).
Fix the ifdefs in the Makefile to define SUPPORTS_SIMPLE_IPC when appropriate.

Previously, the Unix version of the code would only verify that Unix domain sockets were available.

This problem was reported here.

Turdine answered 3/4, 2021 at 3:7 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.