Django-Haystack using Amazon Elasticsearch hosting with IAM credentials
Asked Answered
C

2

8

I am hoping to use Amazon's Elasticsearch server to power a search of longtext fields in a Django database. However, I also don't want to expose this search to those who don't have a log in and don't want to rely on security through obscurity or some IP restriction tactic (unless it would work well with an existing heroku app, where the Django app is deployed).

Haystack seems to go a long way toward this, but there doesn't seem to be an easy way to configure it to use Amazon's IAM credentials to access the Elasticsearch service. This functionality does exist in elasticsearch-py, whichi it uses.

https://elasticsearch-py.readthedocs.org/en/master/#running-with-aws-elasticsearch-service

from elasticsearch import Elasticsearch, RequestsHttpConnection
from requests_aws4auth import AWS4Auth

host = 'YOURHOST.us-east-1.es.amazonaws.com'
awsauth = AWS4Auth(YOUR_ACCESS_KEY, YOUR_SECRET_KEY, REGION, 'es')

es = Elasticsearch(
    hosts=[{'host': host, 'port': 443}],
    http_auth=awsauth,
    use_ssl=True,
    verify_certs=True,
    connection_class=RequestsHttpConnection
)
print(es.info())

Regarding using HTTP authorization, I found this under issues at https://github.com/django-haystack/django-haystack/issues/1046

from urlparse import urlparse
parsed = urlparse('https://user:pass@host:port')
HAYSTACK_CONNECTIONS = {
    'default': {
        'ENGINE': 'haystack.backends.elasticsearch_backend.ElasticsearchSearchEngine',
        'URL': parsed.hostname,
        'INDEX_NAME': 'haystack',
        'KWARGS': {
            'port': parsed.port,
            'http_auth': (parsed.username, parsed.password),
            'use_ssl': True,
        }
    }
}

I am wondering if there is a way to combine these two, something like the following (which, as expected, gives an error since it's more than just a user name and password):

from requests_aws4auth import AWS4Auth
awsauth = AWS4Auth([AACCESS_KEY],[SECRET_KEY],[REGION],'es')


HAYSTACK_CONNECTIONS = {
    'default': {
        'ENGINE': 'haystack.backends.elasticsearch_backend.ElasticsearchSearchEngine',
        'URL': [AWSHOST],
        'INDEX_NAME': 'haystack',
        'KWARGS': {
            'port': 443,
            'http_auth': awsauth,
            'use_ssl': True,
            'verify_certs': True
        }
    },
}

The error here:

TypeError at /admin/
must be convertible to a buffer, not AWS4Auth

Request Method:     GET
Request URL:    http://127.0.0.1:8000/admin/
Django Version:     1.7.7
Exception Type:     TypeError
Exception Value:    

must be convertible to a buffer, not AWS4Auth

Exception Location:     /usr/lib/python2.7/base64.py in b64encode, line 53

Any ideas on how to accomplish this?

Chaschase answered 29/1, 2016 at 17:43 Comment(2)
Are you trying to use AWS credentials to authenticate users against your private ElasticSearch implementation?Bjorn
I've created an Amazon IAM user for the app. I want only those who can access the app to be able to then use it to submit requests to the Elasticsearch server. So only one AWS credential is needed.Chaschase
C
11

You are one step from success, add connection_class to KWARGS and everything should work as expected.

import elasticsearch

HAYSTACK_CONNECTIONS = {
    'default': {
        'ENGINE': 'haystack.backends.elasticsearch_backend.ElasticsearchSearchEngine',
        'URL': [AWSHOST],
        'INDEX_NAME': 'haystack',
        'KWARGS': {
            'port': 443,
            'http_auth': awsauth,
            'use_ssl': True,
            'verify_certs': True,
            'connection_class': elasticsearch.RequestsHttpConnection,
        }
    },
}
Celt answered 2/2, 2016 at 6:28 Comment(1)
Note that I haven't yet employed the features of Haystack in my app, but it did resolve the issue that I'd encountered at that stage. Thanks!Chaschase
B
0

AWS Identity and Access Management (IAM) allows you to manage users and user permissions for AWS services, to control which AWS resources users of AWS itself can access.

You cannot use IAM credentials to authorize users at the application level via http_auth, as it appears you are trying to do via Haystack here. They are different authentication schemes for different services. They are not compatible.

In your security use case, you have stated the need to 1) restrict access to your application, and 2) to secure the Elasticsearch service port from open access. These two requirements can be met using the following methods:

Restrict access to your application

I also don't want to expose this search to those who don't have a log in

For the front-end search app, you want to use a server level Basic access authentication (HTTP auth) configuration on the web server. This is where you want to control user login access to your app, via a standard http_auth username and password (again, not IAM). This will secure your app at the application level.

Secure the Elasticsearch service port

don't want to rely on security through obscurity or some IP restriction tactic (unless it would work well with an existing heroku app, where the Django app is deployed).

IP restriction is exactly what would work here, and consistent with AWS security best practices. You want to use security groups and security group rules as a firewall to control traffic for your EC2 instances.

Given a Haystack configuration of:

HAYSTACK_CONNECTIONS = {
    'default': {
        'ENGINE': 'haystack.backends.elasticsearch_backend.ElasticsearchSearchEngine',
        'URL': 'http://127.0.0.1:9200/',
        'INDEX_NAME': 'haystack',
    },
}

you will want to implement an IP restriction at the security group and/or ACL level on that IP and port 127.0.0.1, to restrict access from only your Django host or other authorize hosts. This will secure it from any unauthorized access at the service level.

In your implementation, the URL will likely resolve to a public or private IP, depending on your network architecture.

Bjorn answered 29/1, 2016 at 22:49 Comment(2)
Thanks Rodrigo. Is there a reason why I wouldn't want to save that information in my Heroku config vars and then pass them to a view that then sends the request to Amazon? It seems like that should work for restricting access, because I can require users to be logged in to visit the search page.Chaschase
You bet @Chaschase I don't know Heroku 100% and not sure how you have your config vars configured. I do know it runs on AWS. The above configuration is the standard way to do these types of secure cloud implementations. What type of http server is running the search app? Have you tried http_auth there?Bjorn

© 2022 - 2024 — McMap. All rights reserved.