Validate that input string does not exceed word limit
Asked Answered
G

10

15

I want to count the words in a specific string so that I can validate it and prevent users to write more than, for example, 100 words.

I wrote this function, but I don't think it's effective enough. I used the explode function with space as a delimiter, but what if the user puts two spaces instead of one? Can you give me a better way to do that?

function isValidLength($text , $length){
  
   $text  = explode(" " , $text );
   if(count($text) > $length)
          return false;
   else
          return true;
}
Gaskell answered 24/1, 2011 at 20:27 Comment(2)
#21652761Mazurek
You might find count(s($str)->words()) helpful, as found in this standalone library.Haar
A
25

Maybe str_word_count could help

http://php.net/manual/en/function.str-word-count.php

$Tag  = 'My Name is Gaurav'; 
$word = str_word_count($Tag);
echo $word;
Aerification answered 24/1, 2011 at 20:30 Comment(6)
Just one other has mentioned str_word_count. Isn't it appropriate?Aerification
str_word_count is BAD! It counts "the" multiple times if it is contained in bigger words like "theme" "theory" etc. str_word_count sucks and I see it all over on stackoverflowCourcy
@Courcy What about offering an alternative rather than ranting like a madman.Roborant
This function also counts hyphens as words. I found it better using this function after using a preg_replace to replace all none alpha characters e.g: str_word_count(preg_replace('/[^a-z]+/i', ' ', $string))Camelopardus
str_word_count will consider "Yet" and "yet" as two different words. which is fair enough I guess. This can be solved by lower casing the string prior to testing.Gropius
str_word_count() is always returning 1 for some reasonMeurer
L
21

Try this:

function get_num_of_words($string) {
    $string = preg_replace('/\s+/', ' ', trim($string));
    $words = explode(" ", $string);
    return count($words);
}

$str = "Lorem ipsum dolor sit amet";
echo get_num_of_words($str);

This will output: 5

Letty answered 23/12, 2012 at 11:43 Comment(2)
This is actually the best answer so far that is both concise and doesn't have serious issues of some kind. But I would simplify the function body as simply return count(explode(' ', preg_replace('/\s+/', ' ', trim($string))));.Gravimetric
Why wouldn't anyone just use preg_split() instead of prepping the string then exploding. This answer isn't quite as elegant as it could be.Drais
T
10

You can use the built in PHP function str_word_count. Use it like this:

$str = "This is my simple string.";
echo str_word_count($str);

This will output 5.

If you plan on using special characters in any of your words, you can supply any extra characters as the third parameter.

$str = "This weather is like el ninã.";
echo str_word_count($str, 0, 'àáã');

This will output 6.

Thamos answered 24/1, 2011 at 20:30 Comment(6)
@Blender: PHP is just awesome. All you want is in the standard library. Just this little makeBlog() function is still missing.Pemba
@Michael Irigoyen: he probably meant "why does PHP have so many functions?" in a rhetorical sense.Hedgerow
This function will not work with non-ascii characters (e.g. accented letters). str_word_count("déjà") outputs 2.Corkscrew
@user576875: a) it's locale dependant, b) you can specify further "word" characters.Pemba
@nikic LC_ALL=fr_FR.UTF-8, still outputs 2 :) The $charlist parameter will not work well with multibyte characters.Corkscrew
Oh wait, let me call post_reply_to_latest_stackoverflow_comment(':P');.Lithiasis
C
4

This function uses a simple regex to split the input $text on any non-letter character:

function isValidLength($text, $length) {
    $words = preg_split('#\PL+#u', $text, -1, PREG_SPLIT_NO_EMPTY);
    return count($words) <= $length;
}

This ensures that is works correctly with words separated by multiple spaces or any other non-letter character. It also handles unicode (e.g. accented letters) correctly.

The function returns true when the word count is less than $length.

Corkscrew answered 24/1, 2011 at 20:30 Comment(0)
A
4

str_count_words has his flaws. it will count underscores as separated words like this_is two words:

You can use the next function to count words separated by spaces even if theres more than one between them.

function count_words($str){

    while (substr_count($str, "  ")>0){
        $str = str_replace("  ", " ", $str);
    }
    return substr_count($str, " ")+1;
}


$str = "This   is  a sample_test";

echo $str;
echo count_words($str);
//This will return 4 words;
Anam answered 20/9, 2012 at 6:43 Comment(0)
R
2

Use preg_split() instead of explode(). Split supports regular expressions.

Rattlebox answered 24/1, 2011 at 20:29 Comment(0)
D
1

If you need greater utility for defining "a word" in the context of your application, then a call of preg_match_all() returns its matches count. If you need multibyte support then add the unicode pattern modifier. \pL and \pM are letters and letter marks to err on the side of inclusivity. Consider this a starting place and understand that the regex rules of what is "a word" can be tightened or loosened as needed.

This solution is multibyte-safe.

Code: (Demo) (Regex101 Demo)

function isValidLength($text, $length) {
    return $length <= preg_match_all("~[\pL\pM'-]+~u", $text);
}

Alternatively, if it is a required field and you only need to count space-delimited "non-whitespace substrings", then you can just write:

if (preg_match("~^\s*\S+(\s+\S+){0,99}\s*$~", $text)) { ... }

or

if (preg_match("~^\S+(\s+\S+){0,99}$~", trim($text))) { ... }
Drais answered 17/9, 2021 at 1:54 Comment(0)
P
0

Using substr_count to Count the number of any substring occurrences. for finding number of words set $needle to ' '. int substr_count ( string $haystack , string $needle)

$text = 'This is a test';
echo substr_count($text, 'is'); // 2


echo substr_count($text, ' ');// return number of occurance of words
Prohibit answered 9/4, 2011 at 14:50 Comment(1)
There are a few issues with this. It counts spaces, not words. So if there's one word it would return 0. And it counts multiple spaces as words (such as if you put two spaces after each period as is often done).Gravimetric
W
0

There are n-1 spaces between n objects so there will be 99 spaces between 100 words, so u can choose and average length for a word say for example 10 characters, then multiply by 100(for 100 words) then add 99(spaces) then you can instead make the limitation based on number of characters(1099).

function isValidLength($text){

if(strlen($text) > 1099)

     return false;

else return true;

}

Weikert answered 9/3, 2016 at 14:29 Comment(0)
G
0

I wrote a function which is better than str_word_count because that PHP function counts dashes and other characters as words.

Also my function addresses the issue of double spaces, which many of the functions other people have written don't take account for.

As well this function handles HTML tags. Where if you had two tags nested together and simply used the strip_tags function this would be counted as one word when it's two. For example: <h1>Title</h1>Text or <h1>Title</h1><p>Text</p>

Additionally, I strip out JavaScript first other wise the code within the <script> tags would be counted as words.

Lastly, my function handles spaces at the beginning and end of a string, multiple spaces, and line breaks, return characters, and tab characters.

###############
# Count Words #
###############
function count_words($str)
{
 $str = preg_replace("/[^A-Za-z0-9 ]/","",strip_tags(str_replace('<',' <',str_replace('>','> ',str_replace(array("\n","\r","\t"),' ',preg_replace('~<\s*\bscript\b[^>]*>(.*?)<\s*\/\s*script\s*>~is','',$str))))));
 while(substr_count($str,'  ')>0)
 {
  $str = str_replace('  ',' ',$str);
 }
 return substr_count(trim($str,' '),' ')+1;
}
Gileadite answered 20/5, 2016 at 17:0 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.