Why does Pygment.rb not highlight <code> tags within <pre class="lang"> properly -i.e. Google Prettify friendly tags?
Asked Answered
M

1

6

I am calling it in my view like this:

<%= markdown question.body %>

This is what my ApplicationHelper looks like:

module ApplicationHelper
    class HTMLwithPygments < Redcarpet::Render::HTML
      def block_code(code, language)
        Pygments.highlight(code, lexer:language)
      end
    end

    def markdown(text)
        renderer = HTMLwithPygments.new(hard_wrap: true)
        options = {
          autolink: true,
          no_intra_emphasis: true,
          fenced_code_blocks: true,
          lax_html_blocks: true,
          strikethrough: true,
          superscript: true
        }
        Redcarpet::Markdown.new(renderer, options).render(text).html_safe
    end
end

But, when it encounters tags like this:

<pre class="lang-cpp prettyprint-override">

It doesn't apply the color highlights to that code. Why is that?

P.S. This is generated, for instance, by Stack Overflow by doing this: <!-- language: lang-cpp -->

Edit 1

Or more specifically, it seems that it won't format the <code> tags that are within <pre> tags. Once <code> is not within <pre> it seems to format it fine. How do I remedy that?

Edit 2

The problem seems to be the data that Pygment.rb is acting on. It is HTML, as can be seen in this gist - https://gist.github.com/marcamillion/14fa121cf3557d38c1a8. So what I want to be able to do is to have Pygment properly format the code returned in the body attribute of that object in my gist.

How do I do that?

Edit 3

This is the HTML code that I would like Pygment.rb and Redcarpet to perform syntax highlighting on:

<p>Here is a piece of C++ code that shows some very peculiar performance. For some strange reason, sorting the data miraculously speeds up the code by almost 6x:</p>

<pre class="lang-cpp prettyprint-override"><code>#include &lt;algorithm&gt;
#include &lt;ctime&gt;
#include &lt;iostream&gt;

int main()
{
    // Generate data
    const unsigned arraySize = 32768;
    int data[arraySize];

    for (unsigned c = 0; c &lt; arraySize; ++c)
        data[c] = std::rand() % 256;

    // !!! With this, the next loop runs faster
    std::sort(data, data + arraySize);

    // Test
    clock_t start = clock();
    long long sum = 0;

    for (unsigned i = 0; i &lt; 100000; ++i)
    {
        // Primary loop
        for (unsigned c = 0; c &lt; arraySize; ++c)
        {
            if (data[c] &gt;= 128)
                sum += data[c];
        }
    }

    double elapsedTime = static_cast&lt;double&gt;(clock() - start) / CLOCKS_PER_SEC;

    std::cout &lt;&lt; elapsedTime &lt;&lt; std::endl;
    std::cout &lt;&lt; "sum = " &lt;&lt; sum &lt;&lt; std::endl;
}
</code></pre>

<ul>
<li>Without <code>std::sort(data, data + arraySize);</code>, the code runs in <strong>11.54</strong> seconds.</li>
<li>With the sorted data, the code runs in <strong>1.93</strong> seconds.</li>
</ul>

<hr>

<p>Initially I thought this might be just a language or compiler anomaly. So I tried it in Java:</p>

<pre class="lang-java prettyprint-override"><code>import java.util.Arrays;
import java.util.Random;

public class Main
{
    public static void main(String[] args)
    {
        // Generate data
        int arraySize = 32768;
        int data[] = new int[arraySize];

        Random rnd = new Random(0);
        for (int c = 0; c &lt; arraySize; ++c)
            data[c] = rnd.nextInt() % 256;

        // !!! With this, the next loop runs faster
        Arrays.sort(data);

        // Test
        long start = System.nanoTime();
        long sum = 0;

        for (int i = 0; i &lt; 100000; ++i)
        {
            // Primary loop
            for (int c = 0; c &lt; arraySize; ++c)
            {
                if (data[c] &gt;= 128)
                    sum += data[c];
            }
        }

        System.out.println((System.nanoTime() - start) / 1000000000.0);
        System.out.println("sum = " + sum);
    }
}
</code></pre>

<p>with a similar but less extreme result.</p>

<hr>

<p>My first thought was that sorting brings the data into cache, but my next thought was how silly that is because the array was just generated.</p>

<p>What is going on? Why is a sorted array faster than an unsorted array? The code is summing up some independent terms, the order should not matter.</p>

You can see the current way that this particular question is being rendered at: http://boso.herokuapp.com

It is the most popular question on that site, the first one that you see. You will notice that the code simply has a grey background and is indented. There is no pretty highlighting like Pygment.rb promises and does on other code snippets (similarly to how @rorra has illustrated in other examples in his answer).

I can't strip out the HTML - because I want to parse it properly (i.e. make sure the spacing, etc. is included properly). The only difference that I want, is to get syntax highlighting on the code represented in the body of the question.

Mesocratic answered 28/3, 2013 at 11:40 Comment(0)
W
3

Is there something else you can add in order to reproduce the issue? Like the content of question.body?

If I do something like this on the controller:

class HomeController < ApplicationController
  def index
    @data = <<EOF
~~~ cpp
#include <fstream.h>

int main (int argc, char *argv[]) {
return(0);
}
~~~
EOF
  end
end

and the on the view:

<pre class="lang-cpp prettyprint-override">
  <%= markdown @data %>
</pre>

it works totally fine, I can see the parsed code without any problem. What's the content of question.body? And can you save the content of the web page (from your browser) and save it on a gist so we can debug?

Thx


Regarding your last comment, its a simple css issue, on your stylesheet, you can add:

.code {
  color: #DD1144 !important;
}

and it will work, the problem is that you have a css rule written like:

pre .code {
  color: inherited;
}

and that's using the color #333333 inherited from the body class


Here's a screen on how it looks like with the css updated:

enter image description here


The sample app with your code runs totally fine, I would need a sample app code app, or a sample code where we can reproduce the issue you are having (not having the right css/stylesheets for the formatted code).

This is an example of how the sample app looks like:

enter image description here


enter image description here


Final edit, the problem is not the library, and its not the way you are rendering the question, its the content you are rendering, check the body of your questions, this is one of the questions I got with the body that actually is rendered as the library should render, but its not rendering as you are expecting :)

@data = <<EOF
    <p>I've been messing around with <a href="http://en.wikipedia.org/wiki/JSON">JSON</a> for some time, just pushing it out as text and it hasn't hurt anybody (that I know of), but I'd like to start doing things properly.</p>

    <p>I have seen <em>so</em> many purported "standards" for the JSON content type:</p>

    <pre><code>application/json
    application/x-javascript
    text/javascript
    text/x-javascript
    text/x-json
    </code></pre>

    <p>But which is correct, or best? I gather that there are security and browser support issues varying between them.</p>

    <p>I know there's a similar question, <em><a href="https://mcmap.net/q/16231/-what-mime-type-if-json-is-being-returned-by-a-rest-api">What MIME type if JSON is being returned by a REST API?</a></em>, but I'd like a slightly more targeted answer.</p>
EOF

And this is another one I just copied/pastle from stackoverflow, that renders with all the syntax highlighted, do you notice the difference? So update your crawler to get the questions in the right format and it will work

@data = <<EOF
Here is a piece of C++ code that shows some very peculiar performance. For some strange reason, sorting the data miraculously speeds up the code by almost 6x:

<!-- language: lang-cpp -->

    #include <algorithm>
    #include <ctime>
    #include <iostream>

    int main()
    {
        // Generate data
        const unsigned arraySize = 32768;
        int data[arraySize];

        for (unsigned c = 0; c < arraySize; ++c)
            data[c] = std::rand() % 256;

        // !!! With this, the next loop runs faster
        std::sort(data, data + arraySize);

        // Test
        clock_t start = clock();
        long long sum = 0;

        for (unsigned i = 0; i < 100000; ++i)
        {
            // Primary loop
            for (unsigned c = 0; c < arraySize; ++c)
            {
                if (data[c] >= 128)
                    sum += data[c];
            }
        }

        double elapsedTime = static_cast<double>(clock() - start) / CLOCKS_PER_SEC;

        std::cout << elapsedTime << std::endl;
        std::cout << "sum = " << sum << std::endl;
    }

 - Without `std::sort(data, data + arraySize);`, the code runs in **11.54** seconds.
 - With the sorted data, the code runs in **1.93** seconds.

----------

Initially I thought this might be just a language or compiler anomaly. So I tried it in Java:

<!-- language: lang-java -->

    import java.util.Arrays;
    import java.util.Random;

    public class Main
    {
        public static void main(String[] args)
        {
            // Generate data
            int arraySize = 32768;
            int data[] = new int[arraySize];

            Random rnd = new Random(0);
            for (int c = 0; c < arraySize; ++c)
                data[c] = rnd.nextInt() % 256;

            // !!! With this, the next loop runs faster
            Arrays.sort(data);

            // Test
            long start = System.nanoTime();
            long sum = 0;

            for (int i = 0; i < 100000; ++i)
            {
                // Primary loop
                for (int c = 0; c < arraySize; ++c)
                {
                    if (data[c] >= 128)
                        sum += data[c];
                }
            }

            System.out.println((System.nanoTime() - start) / 1000000000.0);
            System.out.println("sum = " + sum);
        }
    }

with a similar but less extreme result.

----------

My first thought was that sorting brings the data into cache, but my next thought was how silly that is because the array was just generated.

What is going on? Why is a sorted array faster than an unsorted array? The code is summing up some independent terms, the order should not matter.

EOF
Watchband answered 2/4, 2013 at 10:26 Comment(27)
I can do you one better. You can see it live - boso.herokuapp.com - Look at the first question that loads (the one with 4000+ score). You will notice that the code within the grey background has no colors. Yet, the code right below the first grey box - that is on the line 'Without std::sort(data, data + arraySize);, the code runs in 11.54 seconds.'...that code snippet is properly colored and highlighted. What gives?Mesocratic
That's actually a css issue, you can add: .code { color: #DD1144 !important; } and it will have the color on it. I explained a little better on the answer.Watchband
Hrmm...nah...it's not just a CSS rule. The thing is, pygment.rb should highlight it in a multi-colored way. If you notice in the instructions in the README - github.com/tmm1/pygments.rb - it has an option to set the CSS style to be applied in the highlight. So, in theory, depending on the language, pygment.rb should be highlighting the characters in the code snippet with multiple colors (kinda similar to how the instructions in the Github readme looks).Mesocratic
The example you posted, the url, is clearly a css issue, you said the code under the black one was how it should look like, and its definitely a css issue. I totally agree with you that it should look with different colors depending on the language, I made a sample app by following your code, and looks fine with all the css on my desktop, so on your source code, there is definitely an issue that is avoiding the code to show with the different css and styles, but unless you can create a sample app to replicate the issue, i won't be able to help more than that.Watchband
But the code you wrote on the question, works totally fine. I updated the answer with a screenshoot of a sample app using your code.Watchband
What does the HTML for your app look like? Do you have it publicly available so I can compare the HTML of yours vs mine?Mesocratic
Well there is 1 major difference between your data and mine. In your view, you call <%= markdown @data %> within the pre tag. Whereas, in mine...the pre tag is contained in the @data passed to the markdown helper. See these gists here - gist.github.com/marcamillion/8139c8381cf359f24040. So that's my issue. The <pre class="lang-cpp prettyprint-override"><code> is being returned by the object and that is what pygment.rb should be formatting. As opposed to it just returning the code within the code tags and being rendered within the pre tags. Make sense?Mesocratic
The easiest way to see a difference between your version and mine, is to look at the HTML output. Can you gist that? It will be very clear, I believe, if you do that. I am actually trying to get mine to look formatted just like yours. But you will notice that all those elements within the code hav span tags around them, that have specific classes. If you look at my HTML, mine doesn't have that. That's what I am trying to figure out how to produce.Mesocratic
I think that doesn't make a difference, I moved the pre tag within the @data and it looks the same. The gists for the html: gist.github.com/rorra/6e50a247638332fca139. If you can just create a sample app and replicate the issue, it should take you like 10 minutes to create the sample app and publish it, I can take a look at it and identify what's the problem.Watchband
You can also checkout my code, and update the @data content to replicate the issue, than I'll be able to help, I cannot fix it until I can replicate the issue.Watchband
btw, you can also checkout the app, run script/rails s and you can just go to the root dir localhost:3000 and you should see the app in action with the xhtml/cssWatchband
Ok before I do all of that...this is an easy way to test. Use this question as the @data - #11228309 Click on 'edit' and just copy the entire body of the question and put it into your @data variable. Have that be output to the screen and post a screenshot of that - within the HTML structure I have in my gist. That's the easiest thing to test right now.Mesocratic
Can you post a screenshot, so I can see?Mesocratic
Done, but I insist that you should checkout the code, you could easily checkout the code within five minutes, start the server and play with it.Watchband
Ok...here is the app - github.com/marcamillion/boso - clone it and use data from SO to populate the answers, questions and tags. Look forward to seeing if it renders properly for you.Mesocratic
Sorry, that's a lot of work, I don't have the time to learn about the app, if you can build a sample app as the one I did, where I can just checkout the app, start the server and see the issue right away, let me know and I'll check it. ThxWatchband
Ok....I just pushed an updated seeds.rb file to the repo. You can pull it and just migrate the db then run rake db:seed. That should get you going with sample data. Let me know if you need anything else.Mesocratic
Answered, the problem is not the way you are rendering the question, that works fine, the problem is the content you are trying to render, the question.body doesn't have the right format for pygment to highlight the code syntax, that's the reason it works when you copy/pastle a question from stackoverflow, but it doesn't work when you craw the data the way you are doing in your code. BTW, I found a lot of issues when running your app, like fields renamed, or missing.Watchband
Not quite sure I understand what you are saying the problem is. Is it that one contains HTML, whereas the other one is Markdown? If that is what you are saying, then it seems the problem is with Serel.rb because that is the result it returns from the API. Here is a sample of that same question coming DIRECTLY from SO via Serel - gist.github.com/marcamillion/14fa121cf3557d38c1a8Mesocratic
Upon more investigation, it seems that SO's API formats the body as HTML. See that same question directly from the API here - api.stackexchange.com/docs/…*&site=stackoverflow&run=true - So how do I handle this? How do I get rid of the HTML markup, but save the markdown spacing and such? Or is that impossible? Or rather, how do I get Pygment to properly markup the HTML returned (which was infact my original question)?Mesocratic
You can even create code to parse the answer that came from Serel to one that is compatible with markdown in a way that renders the code as expected, or look for some other way to get the question. Once you get the question url, you can just fetch the question with mechanize, but that's really a research you have to do. The problem here is that the content that you are sending to pygment to be rendered as code, is not recognized by pygment as a piece of code.Watchband
Hrmm....I was under the impression, from this Railscast - railscasts.com/episodes/… - that I could do exactly that. I could pass HTML or some code embedded within other text, and Pygment and Redcarpet would parse the HTML/markdown and highlight the syntax as needed. What am I missing? Specifically, scroll down to the section of the asciicast called - Highlighting Snippets Embedded in Markdown Documents.Mesocratic
not sure, I don't have access to the screencast, its a paid screencast. the problem is the content of the question, the body of the question you had starts with something like <pre><code> while the code that is recognized and parsed by redcarpet and pygments starts with <!-- language: lang-cpp -->, so the problem is not on the library. I cannot be more clear than that. <pre><code> is a generic html code, it doesn't say if the code is C++, perl, php or any other, and markdown won't tell pygment to parse and render it in a specific language.Watchband
But remember, that the <pre> tag rendered is actually this: <pre class="lang-cpp prettyprint-override"> i.e. it includes the language in the class of the pre tag. I guess that's where Redcarpet should come in?Mesocratic
in that case, add a sample question with the format and we can look at it, I posted <pre><code> because that what I got with your db:seed. We would already archive this if you were just able to give me the sample content where you are having problems to render, and please, update your question with that sample question, so others can help as well.Watchband
Actually, I finally understand your question. Edit it anyway, and add a sample text that starts with "<pre class="lang-cpp prettyprint-override">", then the rest of the content and the final </pre>. I'm going to try to debug it later, going to bed now. ThxWatchband
I have added the HTML generated from my question - which is the EXACT HTML that I am trying to get to be highlighted properly.Mesocratic

© 2022 - 2024 — McMap. All rights reserved.