Rendering plain text through PHP
Asked Answered
B

6

11

For some reason, I want to serve my robots.txt via a PHP script. I have setup apache so that the robots.txt file request (infact all file requests) come to a single PHP script.

The code I am using to render robots.txt is:

echo "User-agent: wget\n";
echo "Disallow: /\n";

However, it is not processing the newlines. How to server robots.txt correctly, so search engines (or any client) see it properly? Do I have to send some special headers for txt files?

EDIT 1:

Now I have the following code:

header("Content-Type: text/plain");
echo "User-agent: wget\n";
echo "Disallow: /\n";

which still does not display newlines (see http://sarcastic-quotes.com/robots.txt ).

EDIT 2:

Some people mentioned its just fine and not displayed in browser. Was just curious how does this one display correctly: http://en.wikipedia.org/robots.txt

EDIT 3:

I downloaded both mine and wikipedia's through wget, and see this:

$ file en.wikipedia.org/robots.txt
en.wikipedia.org/robots.txt: UTF-8 Unicode English text

$ file sarcastic-quotes.com/robots.txt
sarcastic-quotes.com/robots.txt: ASCII text

FINAL SUMMARY:

Main issue was I was not setting the header. However, there is another internal bug, which is making the Content-Type as html. (this is because my request is actually served through an internal proxy but thats another issue).

Some comments that browsers don't display newline were only half-correct -> modern browsers correctly display newline if content-type is text/plain. I am selecting the answer that closely matched the real problem and was void of the above slightly misleading misconception :). Thanks everyone for the help and your time!

thanks

JP

Brazier answered 22/12, 2010 at 6:20 Comment(3)
Your content-type is still showing up as "text/html".Hottentot
There is either more to your code, or your server is setup to not allow calls to header and/or is misconfigured.Duckworth
Yes, there is more to my code. Let me try to recreate simplest possible file/scenario and post here.Brazier
S
29

Yes, you forgot to set the Content Type of your output to text/plain:

header("Content-Type: text/plain");

Your output is probably being sent as HTML, where a newline is truncated into a space, and to actually display a newline, you would need the <br /> tag.

Schwejda answered 22/12, 2010 at 6:22 Comment(4)
Thanks. I set the header type via the line you mentioned. Still no newlines. Its coming like this: sarcastic-quotes.com/robots.txtBrazier
I checked that page, and I'm still receiving the response type as text/htmlSchwejda
oh thats so strange. let me debug further.Brazier
There might be a problem in the way you've set up Apache to "serve files through a single PHP script". If you could provide that code, we might be able to help.Schwejda
M
5
  1. header('Content-Type: text/plain') is correct.
  2. You must call this method before anything is written to your output, including white space. Check for whitespace before your opening <?php.
  3. If your Content-Type header has been set to text/plain, no browser in its right mind would collapse whitespace. That behaviour is exclusive to HTML and similar formats.
  4. I'm sure you have your reasons, but as a rule, serving static content through PHP uses unnecessary server resources. Every hit to PHP is typically a new process spawn and a few megs of memory. You can use apache config directives to point to different robots files based on headers like User-Agent - I'd be looking into that.
  5. It's likely that search engines ignore the Content-Type header, so this shouldn't be an issue anyway.

Hope this helps.

-n

Moy answered 22/12, 2010 at 8:29 Comment(0)
Q
1
<?php header("Content-Type: text/plain"); ?>
User-agent: wget
Disallow: /

BTW, the newlines are there just fine. They're just not displayed in a browser. Browsers collapse all whitespace, including newlines, to a single space.

deceze$ curl http://sarcastic-quotes.com/robots.txt
User-agent: wget
Disallow: /
Quincyquindecagon answered 22/12, 2010 at 6:24 Comment(1)
Thanks. How come wikipedia's robots.txt displays correctly in browser? See en.wikipedia.org/robots.txtBrazier
S
1

i was having a similar issue and either "\n" nor PHP_EOL worked. I finally used:

header('Content-Disposition: attachment; filename="plaintext.txt"');
header("Content-Type: text/plain");
echo "some data";
echo chr(13).chr(10);

The echo of BOTH characters did the trick. Hope it helps someone.

Bye anankin

Sulfurous answered 23/1, 2015 at 20:34 Comment(0)
D
0

You must set the content type of the document you are serving. In the case of a .txt text file:

header("Content-Type: text/plain");

The IANA has information about some of the more popular MIME (content) types.

Duckworth answered 22/12, 2010 at 6:22 Comment(0)
S
-2

If you are using echo, then use <br> for new lines. the printf function is what uses \n.

In your case, use printf because you are not using HTML. I believe this is the proper way to do this, along with setting the MIME type to text.

Suborn answered 22/12, 2010 at 6:23 Comment(2)
Sorry, that's utter nonsense. \n within double quoted strings is always a newline, it has nothing to do with echo or printf. <br> is only useful in the context of HTML.Quincyquindecagon
print is a synonym for echo. printf is a wrapper around print that will substitute different variable into a format string. How you print the content is irrelevant to how it is displayed.Duckworth

© 2022 - 2024 — McMap. All rights reserved.