Get first N words of a string
Asked Answered
S

13

51

How do I only get the first 10 words from a string?

Slusher answered 10/5, 2011 at 21:20 Comment(1)
You might find s($str)->words(10) helpful, as found in this standalone library.Pessa
W
142
implode(' ', array_slice(explode(' ', $sentence), 0, 10));

To add support for other word breaks like commas and dashes, preg_match gives a quick way and doesn't require splitting the string:

function get_words($sentence, $count = 10) {
  preg_match("/(?:\w+(?:\W+|$)){0,$count}/", $sentence, $matches);
  return $matches[0];
}

As Pebbl mentions, PHP doesn't handle UTF-8 or Unicode all that well, so if that is a concern then you can replace \w for [^\s,\.;\?\!] and \W for [\s,\.;\?\!].

Words answered 10/5, 2011 at 21:22 Comment(6)
This worked great for me. I needed to display only the first 5 sentences however so I switched the 10 to a 5, then switched the ' ' to '. ' in the implode and explode and it worked just fine. I did have to put a period after I displayed the text because the last period was omitted. Thank you.Kirwin
Nice update, +1 for avoiding the splitting (and using regular expressions!). You'll want to watch out for those word boundaries however, as per my updated answer.Necessary
It's unfortunate that PHP still hasn't figured out how to handle Unicode -- thanks for the info, I've updated my answer.Words
thank you very much!!, this worked on my site with WPIMPORTALL to only select the first 6 letters. Also that Unicode, was an excellent add!! wonderfulHenrion
How return 10 worlds if our string have <p>? this not work with string that html on theme...Knell
You're going to have to strip the html out of the string. Try using strip_tags.Words
N
54

Simply splitting on spaces will function incorrectly if there is an unexpected character in place of a space in the sentence structure, or if the sentence contains multiple conjoined spaces.

The following version will work no matter what kind of "space" you use between words and can be easily extended to handle other characters... it currently supports any white space character plus , . ; ? !

function get_snippet( $str, $wordCount = 10 ) {
  return implode( 
    '', 
    array_slice( 
      preg_split(
        '/([\s,\.;\?\!]+)/', 
        $str, 
        $wordCount*2+1, 
        PREG_SPLIT_DELIM_CAPTURE
      ),
      0,
      $wordCount*2-1
    )
  );
}

Regular expressions are perfect for this issue, because you can easily make the code as flexible or strict as you like. You do have to be careful however. I specifically approached the above targeting the gaps between words — rather than the words themselves — because it is rather difficult to state unequivocally what will define a word.

Take the \w word boundary, or its inverse \W. I rarely rely on these, mainly because — depending on the software you are using (like certain versions of PHP) — they don't always include UTF-8 or Unicode characters.

In regular expressions it is better to be specific, at all times. So that your expressions can handle things like the following, no matter where they are rendered:

echo get_snippet('Это не те дроиды, которые вы ищете', 5);

/// outputs: Это не те дроиды, которые

Avoiding splitting could be worthwhile however, in terms of performance. So you could use Kelly's updated approach but switch \w for [^\s,\.;\?\!]+ and \W for [\s,\.;\?\!]+. Although, personally I like the simplicity of the splitting expression used above, it is easier to read and therefore modify. The stack of PHP functions however, is a bit ugly :)

Necessary answered 16/9, 2012 at 7:57 Comment(5)
+1 Why was this at 0 votes? It's a better solution than the other answers. Although, people shouldn't be using camel case in PHP.Summers
@StephenSarcsamKamenar thanks... and good point, I'd been doing too much javascripting that day :)Necessary
I do agree with @StephenSarcsamKamenar's question! I suppose that there are two much answers here. It is a duty of the one that made the question to update the right answer. This is the best for me: +1 with no doubt!Rowney
Great answer. However, I would like to add to the answer that you may need to user trim() around your $str before you process it. This way you eliminate any whitespace in the corners. This would help if you want to check whether you want to add ellipses to the end of the string if the resulting string is a subset of the original.Packthread
I think the above snippet can be slightly optimized by replacing "$str, $wordCount*2+1" with "$str, $wordCount+1" as the counting of the chunks resulted does not include the splitting characters/words.Latoya
M
7

http://snipplr.com/view/8480/a-php-function-to-return-the-first-n-words-from-a-string/

function shorten_string($string, $wordsreturned)
{
    $retval = $string;  //  Just in case of a problem
    $array = explode(" ", $string);
    /*  Already short enough, return the whole thing*/
    if (count($array)<=$wordsreturned)
    {
        $retval = $string;
    }
    /*  Need to chop of some words*/
    else
    {
        array_splice($array, $wordsreturned);
        $retval = implode(" ", $array)." ...";
    }
    return $retval;
}
Mohr answered 10/5, 2011 at 21:22 Comment(0)
M
3

I suggest to use str_word_count:

<?php
$str = "Lorem ipsum       dolor sit    amet, 
        consectetur        adipiscing elit";
print_r(str_word_count($str, 1));
?>

The above example will output:

Array
(
    [0] => Lorem
    [1] => ipsum
    [2] => dolor
    [3] => sit
    [4] => amet
    [5] => consectetur
    [6] => adipiscing
    [7] => elit
)

The use a loop to get the words you want.

Source: http://php.net/str_word_count

Melvinamelvyn answered 10/12, 2014 at 10:12 Comment(0)
C
2

To select 10 words of the given text you can implement following function:

function first_words($text, $count=10)
{
    $words = explode(' ', $text);

    $result = '';
    for ($i = 0; $i < $count && isset($words[$i]); $i++) {
        $result .= $words[$i];
    }

    return $result;
}
Colbert answered 15/2, 2017 at 18:40 Comment(0)
T
2

This can easily be done using str_word_count()

$first10words = implode(' ', array_slice(str_word_count($sentence,1), 0, 10));
Theocrasy answered 10/5, 2017 at 17:30 Comment(0)
R
1

This might help you. Function to return N no. of words

public function getNWordsFromString($text,$numberOfWords = 6)
{
    if($text != null)
    {
        $textArray = explode(" ", $text);
        if(count($textArray) > $numberOfWords)
        {
            return implode(" ",array_slice($textArray, 0, $numberOfWords))."...";
        }
        return $text;
    }
    return "";
    }
}
Rigsby answered 4/3, 2014 at 10:20 Comment(0)
Q
1

Try this

$str = 'Lorem ipsum dolor sit amet,consectetur adipiscing elit. Mauris ornare luctus diam sit amet mollis.';
 $arr = explode(" ", str_replace(",", ", ", $str));
 for ($index = 0; $index < 10; $index++) {
 echo $arr[$index]. " ";
}

I know this is not time to answer , but let the new comers choose their own answers.

Quesenberry answered 19/11, 2015 at 8:38 Comment(0)
O
1
    function get_first_num_of_words($string, $num_of_words)
    {
        $string = preg_replace('/\s+/', ' ', trim($string));
        $words = explode(" ", $string); // an array

        // if number of words you want to get is greater than number of words in the string
        if ($num_of_words > count($words)) {
            // then use number of words in the string
            $num_of_words = count($words);
        }

        $new_string = "";
        for ($i = 0; $i < $num_of_words; $i++) {
            $new_string .= $words[$i] . " ";
        }

        return trim($new_string);
    }

Use it like this:

echo get_first_num_of_words("Lorem ipsum dolor sit amet consectetur adipisicing elit. Aliquid, illo?", 5);

Output: Lorem ipsum dolor sit amet

This function also works very well with unicode characters like Arabic characters.

echo get_first_num_of_words("نموذج لنص عربي الغرض منه توضيح كيف يمكن استخلاص أول عدد معين من الكلمات الموجودة فى نص معين.", 100);

Output: نموذج لنص عربي الغرض منه توضيح كيف يمكن استخلاص أول عدد معين من الكلمات الموجودة فى نص معين.

Ottoman answered 23/11, 2015 at 14:17 Comment(0)
O
0

It is totally what we are searching Just cut n pasted into your program and ran.

function shorten_string($string, $wordsreturned)
/*  Returns the first $wordsreturned out of $string.  If string
contains fewer words than $wordsreturned, the entire string
is returned.
*/
{
$retval = $string;      //  Just in case of a problem

$array = explode(" ", $string);
if (count($array)<=$wordsreturned)
/*  Already short enough, return the whole thing
*/
{
$retval = $string;
}
else
/*  Need to chop of some words
*/
{
array_splice($array, $wordsreturned);
$retval = implode(" ", $array)." ...";
}
return $retval;
}

and just call the function in your block of code just as

$data_itr = shorten_string($Itinerary,25);
Om answered 9/12, 2014 at 20:52 Comment(0)
L
0

I do it this way:

function trim_by_words($string, $word_count = 10) {
    $string = explode(' ', $string);
    if (empty($string) == false) {
        $string = array_chunk($string, $word_count);
        $string = $string[0];
    }
    $string = implode(' ', $string);
    return $string;
}

Its UTF8 compatible...

Lianaliane answered 5/3, 2015 at 11:54 Comment(0)
C
0

This might help you. Function to return 10 no. of words.

function num_of_word($text,$numb) {
 $wordsArray = explode(" ", $text);
 $parts = array_chunk($wordsArray, $numb);

 $final = implode(" ", $parts[0]);

 if(isset($parts[1]))
     $final = $final." ...";
 return $final;
 return;
 }
echo num_of_word($text, 10);
Curriery answered 18/11, 2015 at 5:11 Comment(0)
T
0

Instead of generating an array of N words, then truncating the array, then re-imploding the words, just truncate the input string after the Nth word. Demo

echo preg_replace('/(?:\s*\S+){10}\K.*/', '', $string);

The pattern will search N sequences of zero or more whitespace character followed by one or more non-whitespace characters, then \K restarts the fullstring match (effectively "releasing" the matches characters, then .* will match the rest of the string. Whatever is matched will be replaced with an empty string.

This solution will ensure that the output string does not have more than N words. It is possible that the string has fewer words than N, so be aware that no mutation will take place and that if that string has a trailing whitespace -- that whitespace will not be removed.

To ensure that leading and whitespaces are removed, adjust the pattern to capture zero to N words which are delimited by whitespaces. Demo

$string = '    I would like to know   ';

var_export(
    preg_replace('/\s*(\S*(?:\s+\S+){0,9}).*/', '$1', $string)
);
Trespass answered 17/1, 2024 at 23:29 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.