What type of neural network can handle variable input and output sizes?
Asked Answered
K

3

21

I'm trying to use the approach described in this paper https://arxiv.org/abs/1712.01815 to make the algorithm learn a new game.

There is only one problem that does not directly fit into this approach. The game I am trying to learn has no fixed board size. So currently the input tensor has dimensions m*n*11, where m and n are the dimensions of the game board and can vary each time the game is played. So first of all I need a neural network able to make use of such varying input sizes.

The size of the output is also a function of the board size, as it has a vector with entries for every possible move on the board, and so the output vector will be bigger if the board size increases.

I have read about recurrent and recursive neural networks but they all seem to relate to NLP, and I'm not sure on how to translate that to my problem.

Any ideas on NN architectures able to handle my case would be welcome.

Kerrin answered 4/4, 2018 at 16:20 Comment(2)
for varying input size there are some good answers here: stats.stackexchange.com/questions/388859/… and here: ai.stackexchange.com/questions/2008/… for varying output size that still remains a mystery for me.Culverin
Highly theoretical, fully convolutional network. For example YOLOv3 is capable process different images size(in therms that network will not crash).Lyonnais
C
11

What you need is Pointer Networks (https://arxiv.org/abs/1506.03134)

Here is a introduction quote from a post about it:

Pointer networks are a new neural architecture that learns pointers to positions in an input sequence. This is new because existing techniques need to have a fixed number of target classes, which isn't generally applicable— consider the Travelling Salesman Problem, in which the number of classes is equal to the number of inputs. An additional example would be sorting a variably sized sequence. - https://finbarr.ca/pointer-networks/

Its an attention based model.

Essentially a pointer network is used to predict pointers back to the input, meaning your output layer isn't actually fixed, but variable.

A use case where I have used them is for translating raw text into SQL queries.

  • Input: "HOW MANY CARS WERE SOLD IN US IN 1983"
  • Output: SELECT COUNT(Car_id) FROM Car_table WHERE (Country='US' AND Year=='1983')

The issue with raw text such as this is that it will only make sense w.r.t to a specific table (in this case car table with a set of variables around car sales, similar to your different boards for board games). Meaning, that if the question cant be the only input. So the input that actually goes into the pointer network is a combination of -

Input -

  1. Query
  2. Metadata of the table (column names)
  3. Token vocabulary for all categorical columns
  4. Keywords from SQL syntax (SELECT, WHERE etc..)

All of these are appended together.

The output layer then simply points back to specific indexes of the input. It points to Country and Year (from column names in metadata), it points to US and 1983 (from tokens in vocabulary of categorical columns), it points to SELECT, WHERE etc from the SQL syntax component of the input.

The sequence of these indexes in the appended index is then used as the output of your computation graph, and optimized using a training dataset that exists as WIKISQL dataset.

Your case is quite similar, you need to pass the inputs, metadata of the game, and the stuff you need as part of your output as an appended index. Then the pointer network simply makes selections from the input (points to them).

Carbarn answered 31/7, 2019 at 11:51 Comment(0)
P
0

You need to go back to a fixed input / output problem.

A common way to fix this issue when applying to images / time series... is to use sliding windows to downsize. Perhaps this can be applied to your game.

Partake answered 4/4, 2018 at 16:28 Comment(2)
I could train a network with say a 4 by 4 game board, and then make separate predictions for every 4 by 4 piece of the playing board. The problem then is how to combine the results, and how co compensate for losing information compared to when the entire board is taken into account, as far away parts will more often than not have impact on each other.Kerrin
average, majority voting, custom rules... I don't know your gamePartake
L
0

Fully convolutional neural network is able to do that. Parameters of conv layers are convolutional kernels. Convolutional kernel not so much care about input size(yes there are certain limitations related to stride, padding input and kernel size).

Typical use case is some convlayers followed by the maxpooling and repeated again and again to some point where are filters flattened and connected to dense layer. Dense layer is problem because he expect input at fixed size. If there is another conv2 layer, your output will be another feature map of appropriate size.

Example of such network could be YOLOv3. If you feed it for example with image 416x416x3 output can be for example 13x13xnumber of filters(I know YOLOv3 has more output layers but I will discuss only one because of simplicity). If you feed YOLOv3 with image 256x256x3, output will be feature map 6x6xnumber of filters.

So network don't crash and produce results. Will be results good? I don't know, maybe yes, maybe no. I never use it at such manner I always resize image to recommended size or retrain network.

Lyonnais answered 29/7, 2019 at 17:0 Comment(1)
Did you test yourself? I think TF/keras/... rise error of mismatch tensor size (some thing like this). Did you mean padding?Almandine

© 2022 - 2024 — McMap. All rights reserved.