APC User-Cache suitable for high load environments?
Asked Answered
I

2

2

We try to deploy APC user-cache in a high load environment as local 2nd-tier cache on each server for our central caching service (redis), for caching database queries with rarely changing results, and configuration. We basically looked at what Facebook did (years ago):

http://www.slideshare.net/guoqing75/4069180-caching-performance-lessons-from-facebook http://www.slideshare.net/shire/php-tek-2007-apc-facebook

It works pretty well for some time, but after some hours under high load, APC runs into problems, so the whole mod_php does not execute any PHP anymore. Even a simple PHP script with only does not answer anymore, while static resources are still delivered by Apache. It does not really crash, there is no segfault. We tried the latest stable and latest beta of APC, we tried pthreads, spin locks, every time the same problem. We provided APC with far more memory it can ever consume, 1 minute before a crash we have 2% fragmentation and about 90% of the memory is free. When it „crashes“ we don’t find nothing in error logs, only restarting Apache helps. Only with spin locks we get an php error which is:

PHP Fatal error: Unknown: Stuck spinlock (0x7fcbae9fe068) detected in Unknown on line 0

This seems to be a kind of timeout, which does not occur with pthreads, because those don’t use timeouts.

What’s happening is probably something like that: http://notmysock.org/blog/php/user-cache-timebomb.html

Some numbers: A server has about 400 APC user-cache hits per second and about 30 inserts per second (which is a lot I think), one request has about 20-100 user-cache requests. There are about 300.000 variables in the user-cache, all with ttl (we store without ttl only in our central redis).

Our APC-settings are:

apc.shm_segments=1 
apc.shm_size=4096M
apc.num_files_hint=1000
apc.user_entries_hint=500000
apc.max_file_size=2M
apc.stat=0

Currently we are using version 3.1.13-beta compiled with spin locks, used with an old PHP 5.2.6 (it’s a legacy app, I’ve heard that this PHP version could be a problem too?), Linux 64bit.

It's really hard to debug, we have written monitoring scripts which collect as much data as we could get every minute from apc, system etc., but we cannot see anything uncommon - even 1 minute before a crash.

I’ve seen a lot of similar problems here, but by now we couldn’t find a solution which solves our problem yet. And when I read something like that:

http://webadvent.org/2010/share-and-enjoy-by-gopal-vijayaraghavan

I’m not sure if going with APC for a local user-cache is the best idea in high load environments. We already worked with memcached here, but APC is a lot faster. But how to get it stable?

best regards, Andreas

Importunate answered 7/11, 2013 at 22:30 Comment(3)
How do you clear the cache? Do you use ttl/gc settings in the apc config, or are you regularly calling apc_clear_cache('user')? Currently we can't rely on APC even for one day, so we had to call the function multiple times a day. Do you combine it with cache warming?Importunate
Are you talking about the opcode/file cache or the user cache (apc_store(), apc_fetch()...) here?Importunate
Why is it a security issue?Importunate
R
4

Lesson 1: https://www.kernel.org/doc/Documentation/spinlocks.txt

The single spin-lock primitives above are by no means the only ones. They are the most safe ones, and the ones that work under all circumstances, but partly because they are safe they are also fairly slow. They are slower than they'd need to be, because they do have to disable interrupts (which is just a single instruction on a x86, but it's an expensive one - and on other architectures it can be worse).

That's written by Linus ...

Spin locks are slow; that assertion is not based on some article I read online by facebook, but upon the actual facts of the matter.

It's also, an incidental fact, that spinlocks are deployed at levels higher than the kernel because of the very problems you speak of; untraceable deadlocks because of a bad implementation.

They are used by the kernel efficiently, because that's where they were designed to be used, locking tiny tiny tiny sections, not sitting around and waiting for you to copy your amazon soap responses into apc and back out a billion times a second.

The most suitable kind of locking (for the web, not the kernel) available in APC is definitely rwlocks, you have to enable rwlocks with a configure option in legacy APC and it is the default in APCu.

The best advice that can be given, and I already gave it, is don't use spinlocks, if mutex are causing your stack to deadlock then try rwlocks.

Before I continue, your main problem is you are using a version of PHP from antiquity, which nobody even remembers how to support, in general you should look to upgrade, I'm aware of the constraints on the OP, but it would be irresponsible to negate to mention that this is a real problem, you do not want to deploy on unsupported software. Additionally APC is all but unmaintained, it is destined to die. O+ and APCu are it's replacement in modern versions of PHP.

Anyway, I digress ...

Synchronization is a headache when you are programming at the level of the kernel, with spinlocks, or whatever. When you are several layers removed from the kernel, when you rely on 6 or 7 bits of complicated software underneath you synchronizing properly in order that your code can synchronize properly synchronization becomes, not only a headache for the programmer, but for the executor too; it can easily become the bottleneck of your shiny web application even if there are no bugs in your implementation.

Happily, this is the year 2013, and Yahoo aren't the only people able to implement user caches in PHP :)

http://pecl.php.net/package/yac

This is an, extremely clever, lockless cache for userland PHP, it's marked as experimental, but once you are finished, have a play with it, maybe in another 7 years we won't be thinking about synchronization issues :)

I hope you get to the bottom of it :)

Ribbentrop answered 12/11, 2013 at 22:19 Comment(4)
You mean --enable-apc-pthreadrwlocks? Which APC/APCu-version is recommended to use with rwlocks?Importunate
Is it OK to use APC 3.1.13 with rwlocks? Currently we are migrating to PHP 5.4, and will use O+ for opcode-caching here. For user-cache yac looks very promising, do you know where I can't find information about how stable yac or APCu already are? What are people with large sites using today for a local user-cache?Importunate
AFAIK YAC isn't being used in production, its very new. APCu is APC minus opcode cache. If you are going to run O+ then recommend APCu ... which I maintain, by the way. But still, setup YAC somewhere and get testing with that too ...Ribbentrop
Would you recommend switching vom APC to APCu in production?Importunate
R
1

Unless you are on a freebsd derived operating system it is not a good idea to use spinlocks, they are the worst kind of synchronization on the face of the earth. The only reason you must use them in freebsd is because the implementer negated to include PTHREAD_PROCESS_SHARED support for mutex and rwlocks, so you have little choice but to use the pg-sql inspired spin lock in that case.

Ribbentrop answered 11/11, 2013 at 7:32 Comment(1)
OK, we've seen the benchmarks from Brian Shire at Facebook which AFAIK run on Linux, where spinlocks were the fastest locking mechanism, see slideshare.net/shire/php-tek-2007-apc-facebook/17 (slide 17). With ptheads we don't get the spinlock errors in the logs, but APC makes PHP crash the same way. We can't see any difference between spinlocks and ptheads.Importunate

© 2022 - 2024 — McMap. All rights reserved.