How to remotely connect to GCP ML Engine/AWS Sagemaker managed notebooks?

C

5

9

GCP has finally released managed Jupyter notebooks. I would like to be able to interact with the notebook locally by connecting to it. Ie. i use PyCharm to connect to the externaly configured jupyter notebbok server by passing its URL & token param.

Question also applies to AWS Sagemaker notebooks.

Custommade answered 12/4, 2019 at 5:39 Comment(0)

S

3

On AWS, you can use AWS Glue to create a developer endpoint, and then you create the Sagemaker notebook from there. A developer endpoint gives you access to connect to your python or Scala spark REPL via ssh, and it also allows you to tunnel the connection and access from any other tool, including PyCharm.

For PyCharm professional we have even tighter integration, allowing you to SFTP files and debug remotely.

And if you need to install any dependencies on the notebook, apart from doing it directly on the notebook, you can always choose new>terminal and you will have a connection to that machine directly from your jupyter environment where you can install anything you want.

Stodgy answered 12/4, 2019 at 8:29 Comment(0)

C

5

AWS does not natively support SSH-ing into SageMaker notebook instances, but nothing really prevents you from setting up SSH yourself.

The only problem is that these instances do not get a public IP address, which means you have to either create a reverse proxy (with ngrok for example) or connect to it via bastion box.

Steps to make the ngrok solution work:

download ngrok with curl https://bin.equinox.io/c/4VmDzA7iaHb/ngrok-stable-linux-amd64.zip > ngrok.zip
unzip ngrok.zip
create ngrok free account to get permissions for tcp tunnels
run ./ngrok authenticate with your token
start with ./ngrok tcp 22 > ngrok.log & (& will put it in the background)
logfile will contain the url so you know where to connect to
create ~/.ssh/authorized_keys file (on SageMaker) and paste your public key (likely ~/.ssh/id_rsa.pub from your computer)
ssh by calling ssh -p <port_from_ngrok_logfile> [email protected] (or whatever host they assign to you, it;s going to be in the ngrok.log)

If you want to automate it, I suggest using lifecycle configuration scripts.

Another good trick is wrapping downloading, unzipping, authenticating and starting ngrok into some binary in /usr/bin so you can just call it from SageMaker console if it dies.

It's a little bit too long to explain completely how to automate it with lifecycle scripts, but I've written a detailed guide on https://biasandvariance.com/sagemaker-ssh-setup/.

Correll answered 7/5, 2020 at 17:10 Comment(4)

In order to be a good answer according to the norms on this site, your answer should stand on its own. Linking to your writeup for more color is fine, but let's say your site went down tomorrow: all your post tells me is that it would take me four minutes, if I knew the steps to take. I suggest you take a moment to explain, here, what one would need to do. – Tertian 7/5, 2020 at 17:38

Very fair comment @JesseScherer. I've put more effort into it and would appreciate your comment now. – Correll 8/5, 2020 at 18:8

Hey, thanks for the improvements to your answer. Upvoted! – Tertian 8/5, 2020 at 19:38

outdated? ./ngrok authenticate does not work and ngrok.log does not have any port and host does not exists in ngrok.log – Arroyo 26/5, 2022 at 11:54

S

3

On AWS, you can use AWS Glue to create a developer endpoint, and then you create the Sagemaker notebook from there. A developer endpoint gives you access to connect to your python or Scala spark REPL via ssh, and it also allows you to tunnel the connection and access from any other tool, including PyCharm.

For PyCharm professional we have even tighter integration, allowing you to SFTP files and debug remotely.

And if you need to install any dependencies on the notebook, apart from doing it directly on the notebook, you can always choose new>terminal and you will have a connection to that machine directly from your jupyter environment where you can install anything you want.

Stodgy answered 12/4, 2019 at 8:29 Comment(0)

H

1

There is a way to SSH into a Sagemaker notebook instance without having to use a third party reverse proxy like ngrok, nor setup an EC2 bastion, nor using AWS Systems Manager, here is how you can do it.

Prerequisites

Use your own VPC and not the VPC managed by AWS/Sagemaker for the notebook instance
Configure an ingress rule in the security group of your notebook instance to allow SSH traffic (port 22 over TCP)

How to do it

Create a lifecycle script configuration that is executed when the instance starts
Add the following snippet inside the lifecycle script :

INSTANCE_IP=$(/sbin/ifconfig eth2 | grep 'inet addr:' | cut -d: -f2 | awk '{ print $1}')
echo "SSH into the instance using : ssh ec2-user@$INSTANCE_IP" > ~ec2-user/SageMaker/ssh-instructions.txt

Add your public SSH key inside /home/ec2-user/.ssh/authorized_keys, either manually with the terminal of jupyterlab UI, or inside the lifecycle script above

When your users open the Jupyter interface, they will find the ssh-instructions.txt file which gives the host and command to use : ssh ec2-user@<INSTANCE_IP>

If you want to SSH from a local environment, you'll probably need to connect to your VPN that routes your traffic inside your VPC.

Henbane answered 10/3, 2022 at 15:55 Comment(0)

T

0

GCP's AI Platform Notebooks automatically creates a persistent URL which you can use to access your notebook. Is that what you were looking for?

Tin answered 1/7, 2019 at 0:15 Comment(0)

K

-1

Try using CreatePresignedNotebookInstanceUrl to access your notebook instance using an url.

Koenig answered 29/4, 2019 at 16:26 Comment(1)

Hi, this doesn't seem to work for me with vscode using "Python: Specify local or remote jupyter server for connections". It then asks for a password. – Roma 23/3, 2020 at 1:5

Prerequisites

How to do it

Recommended topics

Hot tags