How to create a graph neural network dataset? (pytorch geometric)

Asked 24/3, 2021 at 20:6 Answered 10/4, 2024 at 11:48

python pytorch graph-databases pytorch-geometric

How can I convert my own dataset to be usable by pytorch geometric for a graph neural network?

All the tutorials use existing dataset already converted to be usable by pytorch. For example if I have my own pointcloud dataset how can i use it to train for classification with graph neural network? What about my own image dataset for classification?

Guadalcanal answered 24/3, 2021 at 20:6 Comment(0)

How you need to transform your data depends on what format your model expects.

Graph neural networks typically expect (a subset of):

node features
edges
edge attributes
node targets

depending on the problem. You can create an object with tensors of these values (and extend the attributes as you need) in PyTorch Geometric wth a Data object like so:

data = Data(x=x, edge_index=edge_index, y=y)
data.train_idx = torch.tensor([...], dtype=torch.long)
data.test_mask = torch.tensor([...], dtype=torch.bool)

Thames answered 24/3, 2021 at 20:22 Comment(1)

Here is data.train_idx what others sometimes call 'train_mask'? & what goes in the [...]? Can this be the length of the nodes or the length of the edges list? – Stagnant 3/5, 2024 at 14:30

just like mentioned in the document. pytorch-geometric

Do I really need to use these dataset interfaces? No! Just as in regular PyTorch, you do not have to use datasets, e.g., when you want to create synthetic data on the fly without saving them explicitly to disk. In this case, simply pass a regular python list holding torch_geometric.data.Data objects and pass them to torch_geometric.loader.DataLoader

from torch_geometric.data import Data
from torch_geometric.loader import DataLoader

data_list = [Data(...), ..., Data(...)]
loader = DataLoader(data_list, batch_size=32)

Barrel answered 28/10, 2021 at 23:45 Comment(0)

from torch_geometric.data import Dataset, Data
class MyCustomDataset(Dataset):
    def __init__():
        self.filename = .. # List of raw files, in your case point cloud
        super(MyCustomDataset, self).__init()

    @property
    def raw_file_names(self):
        return self.filename
    
    @property
    def processed_file_names(self):
        """ return list of files should be in processed dir, if found - skip processing."""
        processed_filename = []
        return processed_filename
    def download(self):
        pass

    def process(self):
        for file in self.raw_paths:
            self._process_one_step(file)

    def _process_one_step(self, path):
        out_path = (self.processed_dir, "some_unique_filename.pt")
        # read your point cloud here, 
        # convert point cloud to Data object
        data = Data(x=node_features,
                    edge_index=edge_index,
                    edge_attr=edge_attr,
                    y=label #you can add more arguments as you like
                    )
        torch.save(data, out_path)
        return

    def __len__(self):
        return len(self.processed_file_names)

    def __getitem__(self, idx):
        data = torch.load(os.path.join(self.processed_dir, self.processed_file_names[idx]))
        return data

This will create data in right format. Then you can use torch_geometric.data.Dataloader to create a dataloader and then train your network.

Barrelhouse answered 25/9, 2021 at 20:5 Comment(1)

Can i ask, what if you don't have the data as a set of files? I asked this question here: #72572341 and I feel like the discussion here kind of helps me, but I can't fully see the link between the two. – Claraclarabella 10/6, 2022 at 10:13

from torch_geometric.data import Data
edge_index = torch.from_numpy(graph_df[["source", "target"]].to_numpy())
x = torch.from_numpy(np.array(embedding_df["vectors"].tolist()))

data = Data(x=x, edge_index=edge_index.T)
data

You can create graph data like this

Undistinguished answered 11/7, 2023 at 10:16 Comment(1)

Should you start with raw data, features into x should be 'transformed' with torch_geometric.transforms.Compose? – Stagnant 17/10, 2023 at 16:38

https://pytorch-geometric.readthedocs.io/en/latest/notes/load_csv.html?highlight=encoder

Then you can in the encoder specify an encoder for each feature type, in your case it can be one for your image others for the rest of your metadata

Rhubarb answered 10/4, 2024 at 11:48 Comment(1)

While the information in your link may solve the OP's question, answers that rely heavily on links are discouraged for these reasons. Please consider updating your answer so that it is self-contained, with the minimal amount of code and instructions required to demonstrate how it works. Thanks – Impower 11/4, 2024 at 2:51

Recommended topics

Hot tags