Replicating TensorFlows Conv2D Operating using Eigen Tensors
Asked Answered
J

1

6

I'm trying to implement a lightweight (minimal library dependencies) version of a TensorFlow graph in c++ and I'm trying to use Eigen Tensor objects to perform the graphs operations. Right now I'm stuck trying to use the Eigen Tensor.convolve() method to try and replicate the behaviour of TensorFlow's Conv2D operation. To keep things simple my initial Conv2D operation has no padding and strides of one.

The input to convolutional layer is a 51x51x1 tensor which is being convolved with a filter bank of size 3x3x1x16. In tensorflow this generates an output tensor of size 49x49x16. Setting up this same operation in C++ using the Eigen code below only populates the first channel of the output tensor, so the top 49x49x1 cells contain the correct values, but the remaining 1-15 channels are not populated.

  Eigen::TensorMap<Eigen::Tensor<float,4> > filter(filter, 3, 3, 1, 16 );
  Eigen::TensorMap<Eigen::Tensor<float,3> > input(inputBuffer, 51, 51, 1 );
  Eigen::TensorMap<Eigen::Tensor<float,3> > output(outputBuffer, 49, 49, 16);

  Eigen::array<ptrdiff_t, 2> convDims({0, 1});
  output = input.convolve(filter, convDims);

I assume that I'm miss-understanding what these functions do and that they are not performing the same operation. To get my implementation working I've tried to loop through the 16 filter channels and apply the convolution method individually to each one, but I'm getting compiler errors which I don't understand with the code below:

  for (int s=0; s<16; ++s)
  {
    Eigen::array<int, 4> fOffset = {0, 0, 0, s};
    Eigen::array<int, 4> fExtent = {3, 3, 1, 1};

    Eigen::array<int, 3> oOffset = {0, 0, s};
    Eigen::array<int, 3> oExtent = {49, 49, 1};

    auto filterSlice = filter.slice(fOffset, fExtent);

    output.slice(oOffset, oExtent) = input.convolve(filterSlice, convDims);
  }

This code produces the following error from somewhere within the Eigen Tensor code, It may have something to do with the assignment to the results of the slice method but I'm not sure. If the result is assigned to an auto type then it compiles, but not if the result is later evaluated.

If anyone knows how to resolve this error or more generally how I can replicated the Conv2D operation using Eigen Tensors that would be a great help.

/home/user/tensorflow_xla/bcc-2.0.2-gcc/sparc-gaisler-elf/include/unsupported/Eigen/CXX11/src/Tensor/TensorConvolution.h: In instantiation of 'void Eigen::TensorEvaluator<const Eigen::TensorConvolutionOp<Dimensions, InputXprType, KernelXprType>, Device>::preloadKernel() [with Indices = const std::array<int, 2>; InputArgType = const Eigen::TensorMap<Eigen::Tensor<float, 3> >; KernelArgType = const Eigen::TensorSlicingOp<const std::array<int, 4>, const std::array<int, 4>, Eigen::TensorMap<Eigen::Tensor<float, 4> > >; Device = Eigen::DefaultDevice]':
/home/user/tensorflow_xla/bcc-2.0.2-gcc/sparc-gaisler-elf/include/unsupported/Eigen/CXX11/src/Tensor/TensorConvolution.h:383:18:   required from 'bool Eigen::TensorEvaluator<const Eigen::TensorConvolutionOp<Dimensions, InputXprType, KernelXprType>, Device>::evalSubExprsIfNeeded(Eigen::TensorEvaluator<const Eigen::TensorConvolutionOp<Dimensions, InputXprType, KernelXprType>, Device>::Scalar*) [with Indices = const std::array<int, 2>; InputArgType = const Eigen::TensorMap<Eigen::Tensor<float, 3> >; KernelArgType = const Eigen::TensorSlicingOp<const std::array<int, 4>, const std::array<int, 4>, Eigen::TensorMap<Eigen::Tensor<float, 4> > >; Device = Eigen::DefaultDevice; Eigen::TensorEvaluator<const Eigen::TensorConvolutionOp<Dimensions, InputXprType, KernelXprType>, Device>::Scalar = float]'
/home/user/tensorflow_xla/bcc-2.0.2-gcc/sparc-gaisler-elf/include/unsupported/Eigen/CXX11/src/Tensor/TensorAssign.h:146:62:   required from 'bool Eigen::TensorEvaluator<const Eigen::TensorAssignOp<LhsXprType, RhsXprType>, Device>::evalSubExprsIfNeeded(Eigen::TensorEvaluator<const Eigen::TensorAssignOp<LhsXprType, RhsXprType>, Device>::Scalar*) [with LeftArgType = Eigen::TensorSlicingOp<const std::array<int, 3>, const std::array<int, 3>, Eigen::TensorMap<Eigen::Tensor<float, 3> > >; RightArgType = const Eigen::TensorConvolutionOp<const std::array<int, 2>, const Eigen::TensorMap<Eigen::Tensor<float, 3> >, const Eigen::TensorSlicingOp<const std::array<int, 4>, const std::array<int, 4>, Eigen::TensorMap<Eigen::Tensor<float, 4> > > >; Device = Eigen::DefaultDevice; Eigen::TensorEvaluator<const Eigen::TensorAssignOp<LhsXprType, RhsXprType>, Device>::Scalar = float]'
/home/user/tensorflow_xla/bcc-2.0.2-gcc/sparc-gaisler-elf/include/unsupported/Eigen/CXX11/src/Tensor/TensorExecutor.h:45:16:   required from 'static void Eigen::internal::TensorExecutor<Expression, Device, Vectorizable, Tileable>::run(const Expression&, const Device&) [with Expression = const Eigen::TensorAssignOp<Eigen::TensorSlicingOp<const std::array<int, 3>, const std::array<int, 3>, Eigen::TensorMap<Eigen::Tensor<float, 3> > >, const Eigen::TensorConvolutionOp<const std::array<int, 2>, const Eigen::TensorMap<Eigen::Tensor<float, 3> >, const Eigen::TensorSlicingOp<const std::array<int, 4>, const std::array<int, 4>, Eigen::TensorMap<Eigen::Tensor<float, 4> > > > >; Device = Eigen::DefaultDevice; bool Vectorizable = false; bool Tileable = false]'
/home/user/tensorflow_xla/bcc-2.0.2-gcc/sparc-gaisler-elf/include/unsupported/Eigen/CXX11/src/Tensor/TensorMorphing.h:448:65:   required from 'Eigen::TensorSlicingOp<StartIndices, Sizes, XprType>& Eigen::TensorSlicingOp<StartIndices, Sizes, XprType>::operator=(const OtherDerived&) [with OtherDerived = Eigen::TensorConvolutionOp<const std::array<int, 2>, const Eigen::TensorMap<Eigen::Tensor<float, 3> >, const Eigen::TensorSlicingOp<const std::array<int, 4>, const std::array<int, 4>, Eigen::TensorMap<Eigen::Tensor<float, 4> > > >; StartIndices = const std::array<int, 3>; Sizes = const std::array<int, 3>; XprType = Eigen::TensorMap<Eigen::Tensor<float, 3> >]'
../tfmin_generated/terrain_model.cpp:215:92:   required from here
/home/user/tensorflow_xla/bcc-2.0.2-gcc/sparc-gaisler-elf/include/unsupported/Eigen/CXX11/src/Tensor/TensorConvolution.h:527:52: error: 'Eigen::TensorEvaluator<const Eigen::TensorSlicingOp<const std::array<int, 4>, const std::array<int, 4>, Eigen::TensorMap<Eigen::Tensor<float, 4> > >, Eigen::DefaultDevice>::Dimensions {aka const struct std::array<int, 4>}' has no member named 'TotalSize'
       size_t kernel_sz = m_kernelImpl.dimensions().TotalSize() * sizeof(Scalar);
Jennajenne answered 5/4, 2019 at 9:58 Comment(8)
Note XLA AOT compilation attempts to provide that functionality (transforming a graph into code with minimal dependencies). You may also check out the code for the CPU kernels of the operation, in tensorflow/core/kernels/conv_2d.h and tensorflow/core/kernels/eigen_spatial_convolutions.h (if that helps).Bible
Yes I've used the XLA AOT compiler for various projects, unfortunately the hardware this project is targeting is not currently supported by the AOT compiler. Since we have been unable to configure this as an AOT built target we are pursuing this option.Jennajenne
I'll look into the Tensorflow source you linked to though, they look very promising. Thanks.Jennajenne
About your error, I'm pretty sure it's because of the auto in auto filterSlice = filter.slice(fOffset, fExtent);. filterSlice will be a Eigen::TensorSlicingOp, which maybe is not accepted by convolve. Maybe it get's fixed if you do Eigen::Tensor<float,4> filterSlice(filter.slice(fOffset, fExtent));, although I think that would cause a copy. The cheaper option would be to make a new Eigen::TensorMap from filter containing the slice.Bible
Can you try the following? Eigen::TensorMap<Eigen::Tensor<float,4>> filterSlice(&filter(0, 0, 0, s), filter.size() / fiter.dimension(3));Bible
@jdehesa I just changed Eigen::Tensor<float, 4> filterSlice(filter.slice(filterOffset, filterExtent)); and it is now compiling fine and producing the correct result. My filter banks are small so I can live with a copy for now. Thanks for your help.Jennajenne
@Jennajenne Currently I'm wrestling with the exact same problem. Did you, in the meantime, find a solution without looping over the filters?Vancevancleave
@TobiasHermann Yes I did eventually work this out. I've just added my own answer so you can see how to do this without using a loop.Jennajenne
J
8

So I eventually found how to perform a 2D convolution using just Eigen tensor function calls, without needing any loops. The code that helped me get here was the Tensorflow eigen_spatial_convolutions.h file @jdehesa linked me to. The lines I linked to have the eigen code required to do a Conv2D operation on both row-major and col-major data so you'll probably only need half of it.

Fundamentally you need to use the Eigen method extract_image_patches to extract the perceptive fields of each filter instance from the input tensor. Then you're reshaping the output of this and your kernel tensor into 2D tensors. This means that each kernel is a vertical column of the reshaped kernel tensor and each row of the reshaped image patches is each patch. You then perform a contraction which is actually a matrix multiplication of these two 2D tensors, and reshape the result back into the correct dimensions to produce the output.

This took me a while to get my head around at first, but it can be done.

outputTensor = inputTensor
.extract_image_patches(kern_w, kern_h, stride_w, stride_h, dilation_w, dilation_h, padding)
.reshape(Eigen::array<int, 2>({patch_count, kern_w*kern_h}))
.contract(kernalTensor.reshape(Eigen::array<int, 2>({kern_w*kern_h, kern_count})), {Eigen::IndexPair < int > (1, 0)})
.reshape(Eigen::array<int, 3>({ output_w, output_h, kern_count }));
Jennajenne answered 20/11, 2019 at 13:17 Comment(3)
Thanks a lot for adding the solution. It sounds very similar to im2col convolution. :)Vancevancleave
Hello! Thank you for posting the solution! Is this solution relatively fast? For instance, I have a Pytorch model that I run with libtorch and I would love to avoid it and use Eigen instead.Arola
@EugeneAlexeev it's not the fastest or the slowest in my experience. However we are talking CPU processing here. If you're comparing it to anything GPU accelerated this will be a snails paceJennajenne

© 2022 - 2024 — McMap. All rights reserved.