Intermittent SSL errors from iOS app to AWS Elastic Beanstalk backend
Asked Answered
D

5

12

My iOS app has had intermittent SSL errors when making HTTPS requests to the backend for several months.

The error description:

An SSL error has occurred and a secure connection to the server cannot be made.

The console logs when in debug mode:

2019-07-06 15:12:37.012198+0100 MyApp[37255:12499941] [BoringSSL] nw_protocol_boringssl_input_finished(1543) [C2.1:2][0x159e8e4a0] Peer disconnected during the middle of a handshake. Sending errSSLClosedNoNotify(-9816) alert
2019-07-06 15:12:37.026641+0100 MyApp[37255:12499941] TIC TCP Conn Failed [2:0x280486d00]: 3:-9816 Err(-9816)
2019-07-06 15:12:37.027759+0100 MyApp[37255:12499941] NSURLSession/NSURLConnection HTTP load failed (kCFStreamErrorDomainSSL, -9816)
2019-07-06 15:12:37.027839+0100 MyApp[37255:12499941] Task <D5AF17C0-C202-4229-BD52-690EFDB10379>.<1> HTTP load failed (error code: -1200 [3:-9816])
2019-07-06 15:12:37.028016+0100 MyApp[37255:12499941] Task <D5AF17C0-C202-4229-BD52-690EFDB10379>.<1> finished with error - code: -1200
2019-07-06 15:12:37.032759+0100 MyApp[37255:12500041] Task <D5AF17C0-C202-4229-BD52-690EFDB10379>.<1> load failed with error Error Domain=NSURLErrorDomain Code=-1200 "An SSL error has occurred and a secure connection to the server cannot be made." UserInfo={NSErrorFailingURLStringKey=https://api.example.com/v1/example/example?param=example, NSLocalizedRecoverySuggestion=Would you like to connect to the server anyway?, _kCFStreamErrorDomainKey=3, _NSURLErrorFailingURLSessionTaskErrorKey=LocalDataTask <D5AF17C0-C202-4229-BD52-690EFDB10379>.<1>, _NSURLErrorRelatedURLSessionTaskErrorKey=(
    "LocalDataTask <D5AF17C0-C202-4229-BD52-690EFDB10379>.<1>"
), NSLocalizedDescription=An SSL error has occurred and a secure connection to the server cannot be made., NSErrorFailingURLKey=https://api.example.com/v1/example/example?param=example, NSUnderlyingError=0x283ff2160 {Error Domain=kCFErrorDomainCFNetwork Code=-1200 "(null)" UserInfo={_kCFStreamPropertySSLClientCertificateState=0, _kCFNetworkCFStreamSSLErrorOriginalValue=-9816, _kCFStreamErrorDomainKey=3, _kCFStreamErrorCodeKey=-9816}}, _kCFStreamErrorCodeKey=-9816} [-1200]

The error occurs mainly on 3G/4G, not wifi, and occurs more often when the network signal is low. If it happens once it will keep happening for the next few requests, but will eventually work again shortly thereafter.

Based on the analytics, user reviews, and user bug reports: it is affecting a large percentage of users, but not 100% of them.

-

The backend is hosted on AWS Elastic Beanstalk. Served as a Docker app, using an Nginx proxy server, and multiple instances behind a load balancer.

I've tried increasing and decreasing the instance sizes and it seemed to make no difference.

I recently made an entirely new Elastic Beanstalk environment from scratch, to see if that helped. Previously it was using the Classic Load Balancer, now it is using the Application Load Balancer. Early indications are it has reduced the number of SSL errors, but they are still occurring.

The new load balancer is using this SSL policy:

ELBSecurityPolicy-FS-2018-06

Which is defined here: https://docs.aws.amazon.com/elasticloadbalancing/latest/application/create-https-listener.html

Should it be using a different SSL policy?

-

In the app the web requests were being made using URLSession.shared.dataTask... etc. And I've also tried using the Alamofire library to see if that made a difference. It did not.

I feel like this may have something to do with Apple's App Transport Security. However, as it fails intermittently I'm at a loss as to how.

The relevant Apple docs are the bottom of this page: https://developer.apple.com/security/

If you need more information to help debug please let me know.

-

UPDATE:

So after trying many of the suggestions (thank you to everyone who contributed!) - and learning a lot more about SSL, load balancers, etc. - I have found something that has fixed the issue.

(Minor caveat: I can't be 100% certain it's completely fixed, due the intermittent nature of the issue and my not so great tracking of it, but all available evidence suggests it is now fixed.)

The "fix" was to move the service to Google Cloud Run, which is basically serverless for Docker containers.

Crucially Google Cloud automatically handles setting up the SSL certificate, so there were zero parts for me to screw up. Another advantage is I'm now only paying for the compute time I'm actually using, so it's cheaper.

Apologies to anyone reading this looking for an actual solution to the original problem, but there are a bunch of good things to investigate in the answers and comments below.

Deforce answered 8/7, 2019 at 11:15 Comment(10)
Is the SSL certificate installed on the load balancer, or in each Docker container?Broad
@MarkB on the load balancer listener, for port 443. Here's a screenshot: pasteboard.co/In1vv1N.pngDeforce
Are you terminating SSL at the load balancer or forwarding the HTTPS request to your server? Nginx may also be configured incorrectly, might be handy to see that.Zuniga
Did you try with NSURLAuthentificationChallenge? #19507707Brice
@A.J.Parr So I had it set to using an Nginx proxy server. I'm now trying it with none, although AWS warns when setting a proxy server that: "Specifies which proxy server to be used for client connections. Static file mappings and gzip compression will not take effect if the proxy server is set to "None"."Deforce
@Prcela I'd expect it to fail 100% of the time if it was failing an SSL challenge? I feel like hacking around that much with the iOS code is likely to be covering up an issue on the backend, as why does this not happen for all apps otherwise?Deforce
please share with us if any progress is done and how ?Brodie
Sorry @karem_gohar, I've been very busy with some personal and professional things the last few weeks, so fixing this issue unfortunately got side lined. I wasn't able to fix it yet, but will be trying setting up the docker app on a different provider. I'll hopefully be able to report back the results of that next week.Deforce
"Peer disconnected during the middle of a handshake." seems a purely network issue (not related to TLS, TLS errors are then consequences of this network problem), and I do not think you can do anything about that on your side. The application should retry to connect anyway, so besides some delay what are the real consequences?Revell
Can you confirm which cipher your certificate is using for encryption? Also, have you run a quick test somewhere like SSL Labs (ssllabs.com/ssltest/index.html)?Malvie
B
5

Disclaimer: This is not an answer to your question I'm just trying to think loudly with you

here is the couple of points I'll be checking thinking it might help me identify the root cause of the issue assuming that you have this info or have the option to get them otherwise it will be a black box unless you can co-debug with amazon

  • it is obvious that this is certificate pinning issue

  • check through Wireshark through 3g modem the TLS version requests is sent and check the required from AWS for example they might require 1.2 and you are sending 1.1

  • this is critical to check the certificate string on the server side and compare it with the client side manually it might be encoded differently through the connection pipeline

  • as long as you said it might fail more often when there is a slow connection check the certificate pinning timeout ( the server might get part of the certificate string and compare it with the one it has and finds mismatch due to connection latency)

  • make sure all the instances of the docker app behind the load balancer have the exact same version of the certificate you are pinning

  • check the statistics of the iOS version that their connections has failed and the security checks in this specific version

Brodie answered 8/7, 2019 at 11:35 Comment(5)
Thanks for the answer. iOS requires that TLS 1.2 is used, so I'm assuming iOS never tries TLS version 1.1. I've had a look at it on Charles, and there is very little info for the request, really just that the status was "Failed", because "Proxy Server Error: SSL client error: Client connection closed via error".Deforce
For the certificate pinning aspect, I didn't think the instances of the Docker app even had a certificate on them, I assumed it was just on the load balancer. And for that I've not done anything special to pin the certificate, is it something that happens by default?Deforce
Also apologies if I don't understand everything, I'm an app developer out of my depth here with rather limited devops skills.Deforce
actually my greatest doubt of all these points is the timeout pointBrodie
Yes the TLS negotiation phase is critical. Perhaps sometimes due to the infrastructure the negotiation fails. A few reasons would be can't agree to which TLS version, or a cert stored on one of the servers is the wrong one (not updated). When it fails is it in the negotiation phase? What side terminates and how does it terminate? Wireshark will show you all.Dharma
P
1

Did you added App Transport Security Settings keys in your Info.plist file?

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>NSAllowsArbitraryLoads</key>
    <true/>
    <key>NSAllowsArbitraryLoadsForMedia</key>
    <true/>
    <key>NSAllowsArbitraryLoadsInWebContent</key>
    <true/>
    <key>NSExceptionDomains</key>
    <dict>
        <key>YOUR_SERVER_COM</key>
        <dict>
            <key>NSExceptionRequiresForwardSecrecy</key>
            <false/>
            <key>NSIncludesSubdomains</key>
            <true/>
        </dict>
        <key>facebook.com</key>
        <dict>
            <key>NSExceptionRequiresForwardSecrecy</key>
            <false/>
            <key>NSIncludesSubdomains</key>
            <true/>
        </dict>
        <key>fbcdn.net</key>
        <dict>
            <key>NSExceptionRequiresForwardSecrecy</key>
            <false/>
            <key>NSIncludesSubdomains</key>
            <true/>
        </dict>
        <key>graph.facebook.com</key>
        <dict>
            <key>NSExceptionRequiresForwardSecrecy</key>
            <false/>
            <key>NSIncludesSubdomains</key>
            <true/>
        </dict>
    </dict>
</dict>
</plist>
Predicative answered 12/7, 2019 at 11:13 Comment(2)
I'd assumed as it was to an HTTPS server it was unnecessary to add this in, but I'll give it a go to see if it fixes it, thanks.Deforce
So unfortunately setting some ATS settings in the Info.plist made no difference to the error, sigh.Deforce
D
1

First of all, I've had all the symptoms you described. When searching for solutions, network team, security team, software team, etc. I talked to all the teams. It is a very difficult problem to solve, but it will be useful to briefly explain how we solve it.

Tip1: As you can see, SSL authentication is not always wrong. Sometimes it has throwing errors. SSL key or any file that used in your infrastructure is ultimately a file with bytes, which sometimes causes this error because not all of them can be sent on your network. I figured that in my case and even debugged the situation it was just like that. corrupted file packages caused of this.

Tip2: The general reason why a request can work correctly and sometimes incorrectly for different clients is that the server responds to some requests by cache. This is usually related to the loadballancer configuration. In my case, the cookie-based authentication has changed with other authentication model by a software engineer. This evolves requests through a static object in the ram, causing problems with byte transfer for a better performance.

The point I strongly recommend. On the server side, you should check Loadballancer properties one by one. Review Life Cycle management. You can even change your authentication method by effecting the Loadballancer to session-based or cookie based what if you need exactly.

Disloyalty answered 29/7, 2019 at 13:6 Comment(2)
Thank you for the answer, it is sounding like the load may be part of the cause.Deforce
Sure @JonCox if you haven't solved this problem yet. Look just loadballancer configuration. It is not interested with SSL or others. I see SSL TLS 1.2 error on your logs but i think it is because there are load ballance configuration missing after SSL handshaking. If you have using F5 ballancer especialy the case is thatDisloyalty
S
1

I dont know much about your backend architecture (docker, nginx). My guess is that your backend originally was written to serve non-mobile browsers, perfectly encrypted content, but was written prior to migrating to AWS and does backend authentication? Now they have asked you to build the IOS app for the front and they "lift and shift" the backend into Elastic Beanstalk? This is a good strategy because its simpler to get going on the cloud and Elastic Beanstalk offers scaling.

The problem with this strategy is that when the original backend encryption traffic gets load balanced, and the encrypted sessions are not configured to float between the scaled backend correctly, it can break the users session and you get errors.

Your hunch to create a new Elastic Beanstalk app and try the application load balancer is a good idea, but I found this in the AWS docs for configuring Load Balancing Elastic Beanstalk that might contradict that:

Unlike a Classic Load Balancer or a Network Load Balancer, an Application Load Balancer can't have transport layer (layer 4) TCP or SSL/TLS listeners. It supports only HTTP and HTTPS listeners. Additionally, it can't use backend authentication to authenticate HTTPS connections between the load balancer and backend instances.

Suggestion

To rule out the load balancing within Elastic Beanstalk, I would create a new Elastic Beanstalk environment with NO load balancing (or a non Elastic Beanstalk AWS compute stack) and see if you still get any of these errors with the clients connecting to this new environment. If there are no errors, then you can confidently tell your team that they need to consider migrating the authentication out of the backend and into AWS services.

Sarmentose answered 2/8, 2019 at 7:36 Comment(1)
Thank you for the answer, and for explaining how the load balancer can work. This backend was originally built for mobile, although no consideration was given to making it specifically tailored to mobile, it also doesn't use any authentication, just "standard" https.Deforce
B
0

DISCLAIMER: we found a solution in our case, but I don't know if it is applicable to every type of Load Balancer.

We had the very same issue, and just found a solution after 2 months of research (and a little bit of help from AWS support).

After analyzing the packets sent to the server, it closed the connection on TLS/SSL handshake when requesting the server with IPv6 address.

Plot twist: by default, VPC load balancers do not support IPv6 requests. https://docs.aws.amazon.com/elasticloadbalancing/latest/classic/elb-internet-facing-load-balancers.html#internet-facing-ip-addresses

As the documentation mentions, the Load Balancer DNS settings with dualstack. prefix can resolve to IPv4 OR IPv6. So the solution here is to remove this dualstack. part from the Load Balancer DNS in order to resolve ONLY to IPv4.

TL;DR: remove dualstack. prefix from your load balancer DNS configuration in your domain provider.

Browbeat answered 3/6, 2020 at 10:25 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.