I've noticed periodic, but consistent latency spikes from my app running on app engine. At first I thought the network may be slow but app stats confirmed that not to be the case.
I've been able to reproduce the latency spikes using older and newer versions of the SDKs, currently I'm using the following:
- App engine SDK: 1.9.42
- Google cloud endpoints: 1.9.42
- Objectify: 5.1.13
- Appstats (to debug network latency)
So usage on the app is pretty low, over the last 30 days I'm generally under 0.04 requests a second:
Most work is done with one instance as well:
Most operations' latency is well under a second, but an alarming number of requests take 10-30x longer.
So I figured it must just be network latency, but every appstat of a slower operation disproved this. Datastore and the network have always been incredibly reliable. Here is the anatomy of a slow request taking over 30 seconds:
Here is the anatomy of a normal request:
At a high level my code is pretty uninteresting: it's a simple api that makes a few network calls and saves / reads data from the cloud datastore. The entire source can be found on github here. The app runs on a single auto scaling app engine instance and is warmed up.
CPU usage over the last month doesnt seem to show anything exciting either:
It's really strange to see that even for quick operations, a huge percentage of time is spent on the CPU, even though the code simply creates a few objects, persists them, and returns JSON. I'm wondering if the CPU is getting pegged on my app engine instance by another app which could be causing performance to periodically degrade.
My appengine.xml config looks like this:
<?xml version="1.0" encoding="utf-8"?>
<appengine-web-app xmlns="http://appengine.google.com/ns/1.0">
<application>sauce-sync</application>
<version>1</version>
<threadsafe>true</threadsafe>
<automatic-scaling>
<!-- always keep an instance up in order to keep startup time low-->
<min-idle-instances>1</min-idle-instances>
</automatic-scaling>
</appengine-web-app>
And my web.xml looks like this:
<web-app xmlns="http://java.sun.com/xml/ns/javaee" version="2.5">
<servlet>
<servlet-name>SystemServiceServlet</servlet-name>
<servlet-class>com.google.api.server.spi.SystemServiceServlet</servlet-class>
<init-param>
<param-name>services</param-name>
<param-value>com.sauce.sync.SauceSyncEndpoint</param-value>
</init-param>
</servlet>
<servlet-mapping>
<servlet-name>SystemServiceServlet</servlet-name>
<url-pattern>/_ah/spi/*</url-pattern>
</servlet-mapping>
<!--reaper-->
<servlet>
<servlet-name>reapercron</servlet-name>
<servlet-class>com.sauce.sync.reaper.ReaperCronServlet</servlet-class>
</servlet>
<servlet-mapping>
<servlet-name>reapercron</servlet-name>
<url-pattern>/reapercron</url-pattern>
</servlet-mapping>
<servlet>
<servlet-name>reaper</servlet-name>
<servlet-class>com.sauce.sync.reaper.ReaperServlet</servlet-class>
</servlet>
<servlet-mapping>
<servlet-name>reaper</servlet-name>
<url-pattern>/reaper</url-pattern>
</servlet-mapping>
<welcome-file-list>
<welcome-file>index.html</welcome-file>
</welcome-file-list>
<filter>
<filter-name>ObjectifyFilter</filter-name>
<filter-class>com.googlecode.objectify.ObjectifyFilter</filter-class>
</filter>
<filter-mapping>
<filter-name>ObjectifyFilter</filter-name>
<url-pattern>/*</url-pattern>
</filter-mapping>
</web-app>
TLDR I'm completely stuck and I'm not sure how to debug or fix this issue and I'm starting to think this is business as usual for smaller apps on app engine.
I'm thinking about turning off the resident instance for a while hoping that my app has just been running some bunk hardware or along side an app that is consuming alot of resources. Has anyone run into similar performance issues or knows of additional ways to profile your app?
EDIT:
I've tried running on 1 resident instance, I've also tried setting concurrent requests at 2-4 per this question with no results. Logs and appstats both confirm that an inordinate amount of time is spent waiting for my code to initially run. Here is a request that takes 25 seconds before my first line of code is run, not sure what endpoints / app engine is doing in this time.
Again load is still low and this is request is on a warmed up instance.
EDIT 2:
Seems like for whatever reason app engine + endpoints doesnt play well with min-idle-instances
set. Reverting to the default app engine config fixed my problem.