How to add an ellipsis hyperlink after the first space beyond 170 characters?
Asked Answered
S

4

0

I have a long text like below:

$postText="It is a long established fact that a reader will be distracted by the readable content of a page when looking at its layout. The point of using Lorem Ipsum is that it has a more-or-less normal distribution of letters, as opposed to using 'Content here, content here', making it look like readable English. Many desktop publishing packages and web page editors now use Lorem Ipsum as their default model text, and a search for 'lorem ipsum' will uncover many web sites still in their infancy.";

I want to add readmore hyperlink after 170 characters without cutting off a word and include a trailing whitespace character.

My coding attempt:

if(strlen($postText)>170){
    $splitArr=preg_split("/.{170}\S*\s/",$postText,2);
    print_r($splitArr);
    exit;
    $postText=$splitArr[0]."...<a class='see-more' href='http://example.com/seemore-link'>read more</a>";
}

Split array always return the first index as null. I checked my regex in REGEX101, and it shows exactly what I need. Please point out what is wrong.

Sumba answered 22/3, 2018 at 6:38 Comment(2)
Do you need to split at word boundaries, or can you accept a word split at any spot in the middle?Dufresne
@Sumba Please do not rollback my edit again. I am trying to help clarify your question and improve search-ability. Multiple volunteers have incorrectly answered your question because it was not very clear that you wanted to preserve the whole last word before truncating. I have also clarified my answer to explain what is wrong in your snippet, what function/pattern adjustments are necessary, and why my method is the best / most direct solution. If you have questions about my answer or insist that your question should be rollback back -- please leave me a comment so that I can understand.Taunyataupe
B
3

split array always return the first index as null.

It doesn't return NULL, it returns an empty string (''); they are completely different objects with different semantics.

The reason why the first element of the returned array is an empty string is clearly documented in the manual page of preg_split():

Return Values:

Returns an array containing substrings of subject split along boundaries matched by pattern, or FALSE on failure.

The regex you provide as the first argument to preg_split() is used to match the delimiter, not the pieces. The function you need is preg_match():

$postText = "It is a long established fact that a reader will be distracted by the readable content of a page when looking at its layout. The point of using Lorem Ipsum is that it has a more-or-less normal distribution of letters, as opposed to using 'Content here, content here', making it look like readable English. Many desktop publishing packages and web page editors now use Lorem Ipsum as their default model text, and a search for 'lorem ipsum' will uncover many web sites still in their infancy.";

preg_match('/^.{170}\S*/', $postText, $matches);

$postText = $matches[0] . " ...<a class='see-more' href='http://example.com/seemore-link'>read more</a>";

If preg_match() returns TRUE, $matches[0] contains the string you need.

There are situations when preg_match() fails with your original regex. For example, if your input string has exactly 170 characters, the \s won't match. This is why I removed the \s from the regex and added a white space in front of the string appended after the match.

Bunni answered 22/3, 2018 at 7:17 Comment(2)
Maybe use if (preg_match(...)) $posText = ...;? According to your answer.Elsewhere
Sure, the OP can check the value returned by preg_match() of they can keep their original check of the length of the input string. Both ways work well. I put in the answer only the minimum amount of code needed to show how to use preg_match().Bunni
T
2

Why is preg_split() returning an empty string for the first element?

That is because the pattern that you feed the function dictates where it should explode/break. The matched characters are treated as a "delimiter" and are, in fact, discarded using the function's default behavior.

When your input string has at least 170 characters, then optional non-whitespace characters, then a whitespace character -- all of these matched characters become the delimiter. When preg_split() splits a string, it will potentially generate zero-length elements depending on the location of the delimiter.

For instance, if you have a string aa and split it on a, the function will return 3 empty elements -- one before the first a, one between the a's, and one after the second a.

Code: (Demo)

$string = "aa";
var_export(preg_split('/a/', $string));
// output: array ( 0 => '', 1 => '', 2 => '', )

To ensure that no empty strings are generated, you can set the fourth parameter of the function to PREG_SPLIT_NO_EMPTY (the 3rd parameter must be declared for the 4th parameter to be recognized).

var_export(preg_split('/a/', $string, -1, PREG_SPLIT_NO_EMPTY));
// output: array ( )

You could add the PREG_SPLIT_NO_EMPTY parameter to your function call to remove the empty string, but because the substring that you want to keep is used as the delimiter, it is lost in the process.


A greater matter of importance is the fact that preg_split() is not the best tool for this job.

Your posted snippet:

  1. checks if the string qualifies for truncation
  2. then it attempts to isolate the leading portion of the text
  3. then intends to overwrite $postText with the element containing leading portion and concatenates the ellipsis hyperlink.

Fortunately, php has a single function that can do all three of these steps without a conditional -- resulting in a clean, direct line of code.

Code: (Demo)

$postText = "It is a long established fact that a reader will be distracted by the readable content of a page when looking at its layout. The point of using Lorem Ipsum is that it has a more-or-less normal distribution of letters, as opposed to using 'Content here, content here', making it look like readable English. Many desktop publishing packages and web page editors now use Lorem Ipsum as their default model text, and a search for 'lorem ipsum' will uncover many web sites still in their infancy.";
$ellipsis = "...<a class='see-more' href='http://example.com/seemore-link'>read more</a>";
echo preg_replace('/.{170}\S*\s\K.+/', $ellipsis, $postText);

The beauty in this call is that if the $postText doesn't qualify for truncation because it doesn't have 170 characters, optionally followed by non-whitespace characters, followed by a whitespace character, then nothing happens -- the string remains whole.

The \K in the pattern commands that the first ~170 characters are released/forgotten/discarded as matched characters. Then the .+ means match one or more of any character (as much as possible). By this pattern logic, there will only be one replacement executed. preg_replace() modifies the $postText string without any concatenation syntax.

*note, if your input string may contain newline characters, you should add the s pattern modifier so that the . will match any character including newline characters. Pattern: /.{170}\S*\s\K.+/s

*if you want to truncate your input string at the end of the word beyond the 170th character, you can use this pattern: /.{170}\S*\K.+/ and you could add a space at the start of the replacement/ellipsis string to provide some separation.


Using a non-regex approach is a bit more clunky and requires a conditional statement to maintain the same level of accuracy (so I don't recommend it, but I'll display the technique anyhow).

Using substr_replace(), you need to check if there is enough length in the string to offer a valid offset for strpos(). If so, you can replace.

Code: (Demo)

$postText = "It is a long established fact that a reader will be distracted by the readable content of a page when looking at its layout. The point of using Lorem Ipsum is that it has a more-or-less normal distribution of letters, as opposed to using 'Content here, content here', making it look like readable English. Many desktop publishing packages and web page editors now use Lorem Ipsum as their default model text, and a search for 'lorem ipsum' will uncover many web sites still in their infancy.";
$ellipsis = "...<a class='see-more' href='http://example.com/seemore-link'>read more</a>";
if (($len = strlen($postText)) > 170 && ($pos = strpos($postText, ' ', 170)) && ++$pos < $len){
    $postText = substr_replace($postText, $ellipsis, $pos);
}
echo $postText;

The above snippet assumes there are only spaces, in the input string (versus tabs and newline characters which you may want to split on).

Taunyataupe answered 22/3, 2018 at 6:56 Comment(5)
You should list your split answer as well, as someone might also find that useful +1.Dufresne
I think the OP want to cut the string after a word. This will cut the string in an arbitrary position, right?Elsewhere
@Elsewhere that reasonable feature is not mentioned in the question. The OP seems to be focussed on the hard 170. There could be application-specific reasons for this strict cutoff point. ...oh I see the pattern now. I'll adjust.Taunyataupe
"my regex in REGEX101, It shows exactly what i need" The match stops after the first non-whitespace character after 170 characters.Elsewhere
Yeah, I see it now.Taunyataupe
A
1

Your regex .{170}\S*\s is fine but has a little problem. It doesn't guarantee if \S* matches rest of a word as it may match an MD5 - 170 characters up to first character of MD5 hash then matching 31 more characters which could be more than this.

You are treating those 170 characters as a delimiter of preg_split, hence you didn't have it in output.

Considering these two things in mind, you may come with a better idea:

$array = preg_split('~^[\s\S]{1,170}+(?(?!\S{10,})\S*)\K~', $string);

PHP live demo

10 ensures there is no non-whitespace characters more than that. If exists it splits right after 170 characters.

Accessing to $array[0] you could add your read more text to it.

Amphioxus answered 22/3, 2018 at 8:42 Comment(0)
C
-1

There is no need to use preg_split you still can trim the characters with substr.

$postText="It is a long established fact that a reader will be distracted by the readable content of a page when looking at its layout. The point of using Lorem Ipsum is that it has a more-or-less normal distribution of letters, as opposed to using 'Content here, content here', making it look like readable English. Many desktop publishing packages and web page editors now use Lorem Ipsum as their default model text, and a search for 'lorem ipsum' will uncover many web sites still in their infancy.";
$limit = 170;
$truncated = substr($postText,0,$limit);
$truncated .= "...<a class='see-more' href='http://example.com/seemore-link'>read more</a>";
var_dump($truncated);

Demo

Cotquean answered 22/3, 2018 at 6:45 Comment(1)
Have you understood the regex used in the question? It tries to match 170 characters, the rest of the last word (if 170 characters end in the middle of a word) and a space. How can this be achieved with substr()?Bunni

© 2022 - 2024 — McMap. All rights reserved.