Removing extra whitespace from generated HTML in MVC
Asked Answered
L

8

31

I have an MVC application view that is generating quite a large HTML table of values (>20MB).

I am compressing the view in the controller using a compression filter

 internal class CompressFilter : ActionFilterAttribute
 {
     public override void OnActionExecuting(ActionExecutingContext filterContext)
     {
         HttpRequestBase request = filterContext.HttpContext.Request;
         string acceptEncoding = request.Headers["Accept-Encoding"];
         if (string.IsNullOrEmpty(acceptEncoding))
             return;
         acceptEncoding = acceptEncoding.ToUpperInvariant();
         HttpResponseBase response = filterContext.HttpContext.Response;
         if (acceptEncoding.Contains("GZIP"))
         {
             response.AppendHeader("Content-encoding", "gzip");
             response.Filter = new GZipStream(response.Filter, CompressionMode.Compress);
         }
         else if (acceptEncoding.Contains("DEFLATE"))
         {
             response.AppendHeader("Content-encoding", "deflate");
             response.Filter = new DeflateStream(response.Filter, CompressionMode.Compress);
         }
     }
 }

Is there a way to also eliminate the (quite large) amount of redundant whitespace generated in the view before I run the compress filter (to reduce compression workload and size)?

EDIT: I got it working using the WhiteSpaceFilter technique suggested by Womp below.

For interest here's the results, as analysed by Firebug:

1) No Compression, no whitespace strip - 21MB, 2.59 minutes
2) With GZIP compression, no whitespace strip - 2MB, 17.59s
3) With GZIP compression, whitespace strip - 558kB, 12.77s

So certainly worth it.

Libbey answered 13/5, 2009 at 0:32 Comment(2)
Interesting results, thanks for posting them.Complaint
I know this is old, but do you fancy posting the full code?Meddle
C
20

This guy wrote a neat little whitespace compactor that simply runs a fast block copy of your bytes through a regular expression to strip out blobs of space. He wrote it as an http module, but you could take the 7 lines of workhorse code out of it and plop it into your function.

Complaint answered 13/5, 2009 at 1:40 Comment(2)
Just a warning to folks, if you read the comments on that link, there are some flaws with the solution.Yacketyyak
RegEx is not "fast". See my measurements https://mcmap.net/q/464798/-removing-extra-whitespace-from-generated-html-in-mvcRigdon
W
7

@womp has already suggested a good way of doing it but that module is pretty outdated. I have been using that but it turns out that it is not an optimal way. Here is the question I asked about:

Remove white space from entire Html but inside pre with regular expressions

Here is how I do it:

public class RemoveWhitespacesAttribute : ActionFilterAttribute {

    public override void OnActionExecuted(ActionExecutedContext filterContext) {

        var response = filterContext.HttpContext.Response;

        //Temp fix. I am not sure what causes this but ContentType is coming as text/html
        if (filterContext.HttpContext.Request.RawUrl != "/sitemap.xml") {

            if (response.ContentType == "text/html" && response.Filter != null) {
                response.Filter = new HelperClass(response.Filter);
            }
        }
    }

    private class HelperClass : Stream {

        private System.IO.Stream Base;

        public HelperClass(System.IO.Stream ResponseStream) {

            if (ResponseStream == null)
                throw new ArgumentNullException("ResponseStream");
            this.Base = ResponseStream;
        }

        StringBuilder s = new StringBuilder();

        public override void Write(byte[] buffer, int offset, int count) {

            string HTML = Encoding.UTF8.GetString(buffer, offset, count);

            //Thanks to Qtax
            //https://mcmap.net/q/471645/-remove-white-space-from-entire-html-but-inside-pre-with-regular-expressions
            Regex reg = new Regex(@"(?<=\s)\s+(?![^<>]*</pre>)");
            HTML = reg.Replace(HTML, string.Empty);

            buffer = System.Text.Encoding.UTF8.GetBytes(HTML);
            this.Base.Write(buffer, 0, buffer.Length);
        }

        #region Other Members

        public override int Read(byte[] buffer, int offset, int count) {

            throw new NotSupportedException();
        }

        public override bool CanRead{ get { return false; } }

        public override bool CanSeek{ get { return false; } }

        public override bool CanWrite{ get { return true; } }

        public override long Length{ get { throw new NotSupportedException(); } }

        public override long Position {

            get { throw new NotSupportedException(); }
            set { throw new NotSupportedException(); }
        }

        public override void Flush() {

            Base.Flush();
        }

        public override long Seek(long offset, SeekOrigin origin) {

            throw new NotSupportedException();
        }

        public override void SetLength(long value) {

            throw new NotSupportedException();
        }

        #endregion
    }

}
Waldgrave answered 7/1, 2012 at 16:17 Comment(1)
Benchmarked this morning. On my setup a fairly large ca. 78KB HTML file takes 250ms to process with this filter (on a high-end i7 setup). Also I noticed that for large files, the filter is called multiple times... meaning this might fail badly if the file is sliced just wrong. There's a better approach in theory... to remove extra white space at compile time, but the only solution I have seen so far fails in my environment. github.com/meleze/Meleze.WebRigdon
R
4

One can remove whitespace at compile time by extending Razor. That eliminates the (very significant by my measurements) runtime hit of removing white space from the generated HTML. The hit is as large as 88ms on a high end i7 trimming a 100KB document using RegEx-based code found on Stack Overflow.

The following provides an implementation of a compile-time solution for MVC 3 and MVC 4:

Meleze.Web

The solution is described at

http://cestdumeleze.net/blog/2011/minifying-the-html-with-asp-net-mvc-and-razor/

(but use the GitHub code or NuGet DLL, as the code in the blog post covers only MVC 3).

Rigdon answered 22/2, 2013 at 0:25 Comment(2)
@MarcinHabuszewski: The first link is not dead. I just clicked it, and it loaded. Perhaps a temporary issue at github.com?Rigdon
I see you misread my comment. I wrote that the second link is dead. As for the first one I meant that it would actually be worth something even if it was dead (which it is not) because it contains the name of the tool which you suggested to use, though this is the only place in the answer which contains the name.Averill
D
2

I would say that if your View is generating over 20mb of data, you may want to investigate different ways to display the data, perhaps paging?

Dupion answered 13/5, 2009 at 0:34 Comment(3)
Due to the specific nature of the application, that isn't possible unfortunately.Libbey
isn't the browser choking on he huge parse though?Dupion
Nope. Seems ok under ie6/7/8, safari, firefoxLibbey
C
2
#region Stream filter
class StringFilterStream : Stream
{
  private Stream _sink;
  private Func<string, string> _filter;

  public StringFilterStream(Stream sink, Func<string, string> filter) {
    _sink = sink;
    _filter = filter;
  }

  #region Mixin Properties/Methods
  public override bool CanRead { get { return true; } }
  public override bool CanSeek { get { return true; } }
  public override bool CanWrite { get { return true; } }
  public override void Flush() { _sink.Flush(); }
  public override long Length { get { return 0; } }
  private long _position;
  public override long Position {
    get { return _position; }
    set { _position = value; }
  }
  public override int Read(byte[] buffer, int offset, int count) {
    return _sink.Read(buffer, offset, count);
  }
  public override long Seek(long offset, SeekOrigin origin) {
    return _sink.Seek(offset, origin);
  }
  public override void SetLength(long value) {
    _sink.SetLength(value);
  }
  public override void Close() {
    _sink.Close();
  }
  #endregion

  public override void Write(byte[] buffer, int offset, int count) {
    // intercept the data and convert to string
    byte[] data = new byte[count];
    Buffer.BlockCopy(buffer, offset, data, 0, count);
    string s = Encoding.Default.GetString(buffer);

    // apply the filter
    s = _filter(s);

    // write the data back to stream
    byte[] outdata = Encoding.Default.GetBytes(s);
    _sink.Write(outdata, 0, outdata.GetLength(0));
  }
}
#endregion

public enum WebWhitespaceFilterContentType
{
  Xml = 0, Css = 1, Javascript = 2
}
public class WebWhitespaceFilterAttribute : ActionFilterAttribute
{
  private WebWhitespaceFilterContentType _contentType;

  public WebWhitespaceFilterAttribute() {
    _contentType = WebWhitespaceFilterContentType.Xml;
  }
  public WebWhitespaceFilterAttribute(WebWhitespaceFilterContentType contentType) {
    _contentType = contentType;
  }

  public override void OnActionExecuting(ActionExecutingContext filterContext) {

    var request = filterContext.HttpContext.Request;
    var response = filterContext.HttpContext.Response;

    switch (_contentType) {
      case WebWhitespaceFilterContentType.Xml:

        response.Filter = new StringFilterStream(response.Filter, s => {
          s = Regex.Replace(s, @"\s+", " ");
          s = Regex.Replace(s, @"\s*\n\s*", "\n");
          s = Regex.Replace(s, @"\s*\>\s*\<\s*", "><");
          // single-line doctype must be preserved
          var firstEndBracketPosition = s.IndexOf(">");
          if (firstEndBracketPosition >= 0) {
            s = s.Remove(firstEndBracketPosition, 1);
            s = s.Insert(firstEndBracketPosition, ">\n");
          }
          return s;
        });
        break;

      case WebWhitespaceFilterContentType.Css:
      case WebWhitespaceFilterContentType.Javascript:

        response.Filter = new StringFilterStream(response.Filter, s => {
          s = Regex.Replace(s, @"/\*([^*]|[\r\n]|(\*+([^*/]|[\r\n])))*\*+/", "");
          s = Regex.Replace(s, @"\s+", " ");
          s = Regex.Replace(s, @"\s*{\s*", "{");
          s = Regex.Replace(s, @"\s*}\s*", "}");
          s = Regex.Replace(s, @"\s*;\s*", ";");
          return s;
        });
        break;
    }
  }
}
Contentment answered 17/6, 2009 at 4:4 Comment(1)
Some low quality code. 1) \s+ already matches linebreaks, so why the next line of \s*\n\s*. 2) should work on OnResultExecuted not OnActionExecuting method, because then we know the content type set by controller. 3) Breaks <pre> elements and <script> block javascript if line comments // used 4) did not find HTML requirement for preserving doctype in seperate line.Whiskey
C
1

Here is a VB.NET version of a whitespace filter attribute I am using in a project:

#Region "Imports"

    Imports System.IO

#End Region

Namespace MyCompany.Web.Mvc.Extensions.ActionFilters

    ''' <summary>
    ''' WhitespaceFilter attribute
    ''' </summary>
    Public NotInheritable Class WhitespaceFilterAttribute
        Inherits ActionFilterAttribute

        ''' <summary>
        ''' Called when action executing.   
        ''' </summary>
        ''' <param name="filterContext">The filter context.</param>
        ''' <remarks></remarks>
        Public Overrides Sub OnActionExecuting(filterContext As ActionExecutingContext)

                filterContext.HttpContext.Response.Filter = New WhitespaceFilterStream(filterContext.HttpContext.Response.Filter)

        End Sub

    #Region "Whitespace stream filter"

            ''' <summary>
            ''' Whitespace stream filter
            ''' </summary>
            Private Class WhitespaceFilterStream
                Inherits Stream

    #Region "Declarations"

                ' Member vars.
                Private Shared regexPattern As New Regex("(?<=[^])\t{2,}|(?<=[>])\s{2,}(?=[<])|(?<=[>])\s{2,11}(?=[<])|(?=[\n])\s{2,}")
                ' Property vars.
                Private sinkStreamValue As Stream
                Private positionValue As Long

    #End Region

    #Region "Constructor(s)"

                ''' <summary>
                ''' Contructor to create a new object.
                ''' </summary>
                ''' <param name="sink"></param>
                ''' <remarks></remarks>
                Public Sub New(sink As Stream)

                    Me.sinkStreamValue = sink

                End Sub

    #End Region

    #Region "Properites"

                ''' <summary>
                ''' Gets the CanRead value.
                ''' </summary>
                ''' <value></value>
                ''' <returns></returns>
                ''' <remarks></remarks>
                Public Overrides ReadOnly Property CanRead() As Boolean
                    Get
                        Return True
                    End Get
                End Property

                ''' <summary>
                ''' Gets the CanSeek value.
                ''' </summary>
                ''' <value></value>
                ''' <returns></returns>
                ''' <remarks></remarks>
                Public Overrides ReadOnly Property CanSeek() As Boolean
                    Get
                        Return True
                    End Get
                End Property

                ''' <summary>
                ''' Gets the CanWrite value.
                ''' </summary>
                ''' <value></value>
                ''' <returns></returns>
                ''' <remarks></remarks>
                Public Overrides ReadOnly Property CanWrite() As Boolean
                    Get
                        Return True
                    End Get
                End Property

                ''' <summary>
                ''' Get Length value.
                ''' </summary>
                ''' <value></value>
                ''' <returns></returns>
                ''' <remarks></remarks>
                Public Overrides ReadOnly Property Length() As Long
                    Get
                        Return 0
                    End Get
                End Property

                ''' <summary>
                ''' Get or sets Position value.
                ''' </summary>
                ''' <value></value>
                ''' <returns></returns>
                ''' <remarks></remarks>
                Public Overrides Property Position() As Long
                    Get
                        Return Me.positionValue
                    End Get
                    Set(value As Long)
                        Me.positionValue = value
                    End Set
                End Property

    #End Region

    #Region "Stream Overrides Methods"

                ''' <summary>
                ''' Stream object Close method.
                ''' </summary>
                ''' <remarks></remarks>
                Public Overrides Sub Close()

                    Me.sinkStreamValue.Close()

                End Sub

                ''' <summary>
                ''' Stream object Close method.
                ''' </summary>
                ''' <remarks></remarks>
                Public Overrides Sub Flush()

                    Me.sinkStreamValue.Flush()

                End Sub

                ''' <summary>
                ''' Stream object Read method.
                ''' </summary>
                ''' <param name="buffer"></param>
                ''' <param name="offset"></param>
                ''' <param name="count"></param>
                ''' <returns></returns>
                ''' <remarks></remarks>
                Public Overrides Function Read(buffer As Byte(), offset As Integer, count As Integer) As Integer

                    Return Me.sinkStreamValue.Read(buffer, offset, count)

                End Function

                ''' <summary>
                ''' Stream object Seek method.
                ''' </summary>
                ''' <param name="offset"></param>
                ''' <param name="origin"></param>
                ''' <returns></returns>
                ''' <remarks></remarks>
                Public Overrides Function Seek(offset As Long, origin As SeekOrigin) As Long

                    Return Me.sinkStreamValue.Seek(offset, origin)

                End Function

                ''' <summary>
                ''' Stream object SetLength method.
                ''' </summary>
                ''' <param name="value"></param>
                ''' <remarks></remarks>
                Public Overrides Sub SetLength(value As Long)

                    Me.sinkStreamValue.SetLength(value)

                End Sub

                ''' <summary>
                ''' Stream object Write method.
                ''' </summary>
                ''' <param name="bufferBytes"></param>
                ''' <param name="offset"></param>
                ''' <param name="count"></param>
                ''' <remarks></remarks>
                Public Overrides Sub Write(bufferBytes As Byte(), offset As Integer, count As Integer)

                    Dim html As String = Encoding.Default.GetString(bufferBytes)

                    Buffer.BlockCopy(bufferBytes, offset, New Byte(count - 1) {}, 0, count)
                    html = regexPattern.Replace(html, String.Empty)
                    Me.sinkStreamValue.Write(Encoding.Default.GetBytes(html), 0, Encoding.Default.GetBytes(html).GetLength(0))

                End Sub

    #End Region

            End Class

    #End Region

        End Class

    End Namespace

And in Global.asax.vb:

Shared Sub RegisterGlobalFilters(ByVal filters As GlobalFilterCollection)

    With filters
        ' Standard MVC filters
        .Add(New HandleErrorAttribute())
        ' MyCompany MVC filters
        .Add(New CompressionFilterAttribute)
        .Add(New WhitespaceFilterAttribute)
    End With

End Sub
Cob answered 12/10, 2011 at 20:57 Comment(4)
I searched few hours on the internet to find an answer that could explain how to compress and remove spaces in the same time and this is the only answer that really works ! I also learned that added those attributes as global filter do the job for every pages with no needs of using a BaseController. Thank you @Ed Degagne One thing I changed was the Write method: var html = Encoding.UTF8.GetString(buffer, offset, count); var reg = new Regex(@"(?<=\s)\s+(?![^<>]*</pre>)"); html = reg.Replace(html, string.Empty); buffer = Encoding.UTF8.GetBytes(html); _base.Write(buffer, 0, buffer.Length);Medawar
@DavidLétourneau can you explain why the change? Thanks!!Cob
Sure, I had some html rendering problems. For instance, input buttons wasn't displayed correctly and few others html components. This is the only reason =)Medawar
Amazing. . . . .Particularity
D
0

Whitespace compresses pretty well, I don't think removing it is going to save you much.

I would suggest trying to offload some of the HTML to the client if possible, use JavaScript to reconstitute things that repeat.

Doubler answered 13/5, 2009 at 0:47 Comment(1)
I totally agree... i think the whitespace will go away with GZIP. I'd be surprised to see another layer of 'whitespace removal' make any difference to the size after GZIP without removing whitespaces.Sackbut
C
-3

If you are returning JSON from the View, it is already minified and should not contain any whitespace or CR/LF. You should use paging to keep from sending so much data to the browser at once.

Collage answered 13/5, 2009 at 0:37 Comment(2)
That depends on what JSON library you are using and how it is configured.Cheviot
Relevant as a comment rather than an answer.Flung

© 2022 - 2024 — McMap. All rights reserved.