Getting the first image in string with php
Asked Answered
E

4

31

I'm trying to get the first image from each of my posts. This code below works great if I only have one image. But if I have more then one it gives me an image but not always the first.

I really only want the first image. A lot of times the second image is a next button

$texthtml = 'Who is Sara Bareilles on Sing Off<br>
<img alt="Sara" title="Sara" src="475993565.jpg"/><br>
<img alt="Sara" title="Sara two" src="475993434343434.jpg"/><br>';

preg_match_all('/<img.+src=[\'"]([^\'"]+)[\'"].*>/i', $texthtml, $matches);
$first_img = $matches [1] [0];

now I can take this "$first_img" and stick it in front of the short description

<img alt="Sara" title="Sara" src="<?php echo $first_img;?>"/>
Enfilade answered 20/9, 2011 at 3:37 Comment(3)
Are you sure the regex is always matching the first one? Try printing the array each time you call it to see: <code>error_log(var_export($matches, true));</code>Sporocarp
Thats my problem. It always returns an image but I need it to return the first image onlyEnfilade
Well your code looks like it should work (I didn't check the regex though). You are accessing the second array, which contains the captured patterns, and then the first element in that array, which is the first image. Did you try printing that whole array out and making sure that when you are seeing the wrong image, the first image also matched? I bet it didn't.Sporocarp
T
51

If you only need the first source tag, preg_match should do instead of preg_match_all, does this work for you?

<?php
    $texthtml = 'Who is Sara Bareilles on Sing Off<br>
    <img alt="Sara" title="Sara" src="475993565.jpg"/><br>
    <img alt="Sara" title="Sara two" src="475993434343434.jpg"/><br>';
    preg_match('/<img.+src=[\'"](?P<src>.+?)[\'"].*>/i', $texthtml, $image);
    echo $image['src'];
?>
Timeworn answered 20/9, 2011 at 4:7 Comment(4)
Weird, I replaced it with your code and Im still getting the second imageEnfilade
Could you paste here the html that's making it fail because as Kelsey said your code should also work so it'd be easier to track was wrong if we have an example that fails.Timeworn
Isn't your code too greedy? So if the alt tag comes after the src it will capture that too. You need +? instead of +. So you have: preg_match('/<img.+src=[\'"](?P<src>.+?)[\'"].*>/i', $texthtml, $image);Shooter
or you can use preg_match_all('/<img [^>]*src=["|\']([^"|\']+)/i', $texthtml, $image);Bloomington
J
5

Don't use regex to parse html. Use an html-parsing lib/class, as phpquery:

require 'phpQuery-onefile.php';

$texthtml = 'Who is Sara Bareilles on Sing Off<br> 
<img alt="Sarahehe" title="Saraxd" src="475993565.jpg"/><br> 
<img alt="Sara" title="Sara two" src="475993434343434.jpg"/><br>'; 
$pq = phpQuery::newDocumentHTML($texthtml);
$img = $pq->find('img:first');
$src = $img->attr('src');
echo "<img alt='foo' title='baa' src='{$src}'>";

Download: http://code.google.com/p/phpquery/

Jerkin answered 20/9, 2011 at 4:24 Comment(3)
Thanks, but the last thing I want is to add a ton of code to fix a small image issue. preg_match is fine for a few lines of html called from the sql. But thank you for taking the time to replyEnfilade
yes,maybe ton of codes but in performance question the phpQuery is much faster if compared with regular expressions.Jerkin
I would agree in most cases. But this is for a small blog rendering 5 images per page. I will keep your suggestion in mind. And I really do thank you for it/Enfilade
N
5

After testing an answer from here Using regular expressions to extract the first image source from html codes? I got better results with less broken link images than the answer provided here.

While regular expressions can be good for a large variety of tasks, I find it usually falls short when parsing HTML DOM. The problem with HTML is that the structure of your document is so variable that it is hard to accurately (and by accurately I mean 100% success rate with no false positive) extract a tag.

For more consistent results use this object http://simplehtmldom.sourceforge.net/ which allows you to manipulate html. An example is provided in the response in the first link I posted.

function get_first_image($html){
require_once('SimpleHTML.class.php')

$post_html = str_get_html($html);

$first_img = $post_html->find('img', 0);

if($first_img !== null) {
    return $first_img->src';
}

return null;
}

Enjoy

Nagy answered 16/8, 2015 at 13:1 Comment(1)
SimpleHTMLDom is the best Idea if you want to extract something from HTML.Mournful
P
0
    $mydoc = new DOMDocument();
    $mydoc->loadHTML($text);
    $imgs = $mydoc->getElementsByTagName('img');
    if ($imgs->length > 0) {
        $first_img = $imgs->item(0);
        print_r( $first_img->getAttribute("src") );
    }

So the $first_img->getAttribute("src") will print the frist src found.

Puduns answered 20/3 at 10:48 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.