How to load in graph from networkx into PyTorch geometric and set node features and labels?
Asked Answered
S

1

12

Goal: I am trying to import a graph FROM networkx into PyTorch geometric and set labels and node features.

(This is in Python)

Question(s):

  1. How do I do this [the conversion from networkx to PyTorch geometric]? (presumably by using the from_networkx function)
  2. How do I transfer over node features and labels? (more important question)

I have seen some other/previous posts with this question but they weren't answered (correct me if I am wrong).

Attempt: (I have just used an unrealistic example below, as I cannot post anything real on here)

Let us imagine we are trying to do a graph learning task (e.g. node classification) on a group of cars (not very realistic as I said). That is, we have a group of cars, an adjacency matrix, and some features (e.g. price at the end of the year). We want to predict the node label (i.e. brand of the car).

I will be using the following adjacency matrix: (apologies, cannot use latex to format this)

A = [(0, 1, 0, 1, 1), (1, 0, 1, 1, 0), (0, 1, 0, 0, 1), (1, 1, 0, 0, 0), (1, 0, 1, 0, 0)]

Here is the code (for Google Colab environment):

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import networkx as nx
from torch_geometric.utils.convert import to_networkx, from_networkx
import torch

!pip install torch-scatter torch-sparse torch-cluster torch-spline-conv torch-geometric -f https://data.pyg.org/whl/torch-1.10.0+cpu.html

# Make the networkx graph
G = nx.Graph()

# Add some cars (just do 4 for now)
G.add_nodes_from([
      (1, {'Brand': 'Ford'}),
      (2, {'Brand': 'Audi'}),
      (3, {'Brand': 'BMW'}),
      (4, {'Brand': 'Peugot'}),
      (5, {'Brand': 'Lexus'}),
])

# Add some edges
G.add_edges_from([
                  (1, 2), (1, 4), (1, 5),
                  (2, 3), (2, 4),
                  (3, 2), (3, 5), 
                  (4, 1), (4, 2),
                  (5, 1), (5, 3)
])

# Convert the graph into PyTorch geometric
pyg_graph = from_networkx(G)

So this correctly converts the networkx graph to PyTorch Geometric. However, I still don't know how to properly set the labels.

The brand values for each node have been converted and are stored within:

pyg_graph.Brand

Below, I have just made some random numpy arrays of length 5 for each node (just pretend that these are realistic).

ford_prices = np.random.randint(100, size = 5)
lexus_prices = np.random.randint(100, size = 5)
audi_prices = np.random.randint(100, size = 5)
bmw_prices = np.random.randint(100, size = 5)
peugot_prices = np.random.randint(100, size = 5)

This brings me to the main question:

  • How do I set the prices to be the node features of this graph?
  • How do I set the labels of the nodes? (and will I need to remove the labels from pyg_graph.Brand when training the network?)

Thanks in advance and happy holidays.

Sluggard answered 22/12, 2021 at 16:50 Comment(0)
D
14

The easiest way is to add all information to the networkx graph and directly create it in the way you need it. I guess you want to use some Graph Neural Networks. Then you want to have something like below.

  1. Instead of text as labels, you probably want to have a categorial representation, e.g. 1 stands for Ford.
  2. If you want to match the "usual convention". Then you name your input features x and your labels/ground truth y.
  3. The splitting of the data into train and test is done via mask. So the graph still contains all information, but only part of it is used for training. Check the PyTorch Geometric introduction for an example, which uses the Cora dataset.
import networkx as nx
import numpy as np
import torch
from torch_geometric.utils.convert import from_networkx


# Make the networkx graph
G = nx.Graph()

# Add some cars (just do 4 for now)
G.add_nodes_from([
      (1, {'y': 1, 'x': 0.5}),
      (2, {'y': 2, 'x': 0.2}),
      (3, {'y': 3, 'x': 0.3}),
      (4, {'y': 4, 'x': 0.1}),
      (5, {'y': 5, 'x': 0.2}),
])

# Add some edges
G.add_edges_from([
                  (1, 2), (1, 4), (1, 5),
                  (2, 3), (2, 4),
                  (3, 2), (3, 5),
                  (4, 1), (4, 2),
                  (5, 1), (5, 3)
])

# Convert the graph into PyTorch geometric
pyg_graph = from_networkx(G)

print(pyg_graph)
# Data(edge_index=[2, 12], x=[5], y=[5])
print(pyg_graph.x)
# tensor([0.5000, 0.2000, 0.3000, 0.1000, 0.2000])
print(pyg_graph.y)
# tensor([1, 2, 3, 4, 5])
print(pyg_graph.edge_index)
# tensor([[0, 0, 0, 1, 1, 1, 2, 2, 3, 3, 4, 4],
#         [1, 3, 4, 0, 2, 3, 1, 4, 0, 1, 0, 2]])


# Split the data 
train_ratio = 0.2
num_nodes = pyg_graph.x.shape[0]
num_train = int(num_nodes * train_ratio)
idx = [i for i in range(num_nodes)]

np.random.shuffle(idx)
train_mask = torch.full_like(pyg_graph.y, False, dtype=bool)
train_mask[idx[:num_train]] = True
test_mask = torch.full_like(pyg_graph.y, False, dtype=bool)
test_mask[idx[num_train:]] = True

print(train_mask)
# tensor([ True, False, False, False, False])
print(test_mask)
# tensor([False,  True,  True,  True,  True])
Donnelldonnelly answered 22/12, 2021 at 18:26 Comment(3)
many thanks for this response! This is exactly what I was looking for. You also correctly predicted my next question regarding the training mask! A quick follow-up question about that (perhaps I need to make a new question dedicated to this?) is: is the mask you defined suitable for semi-supervised classification tasks (i.e. those where we have access to the whole graph, but only some of the labels), or does that method completely ignore the test nodes during the training?Sluggard
[EDIT:] how do we use the mask that you created to actually extract those nodes? Documentation uses data.train_mask, but that doesn't work for this codeSluggard
You can either directly use the variables test_mask/train_mask or simply "add" them to your graph via pyg_graph.train_mak = train_mask. With these masks you will only use that part of the labels during the training (but the features of all nodes).Donnelldonnelly

© 2022 - 2024 — McMap. All rights reserved.