Splitting string by fixed length
Asked Answered
T

4

7

I am looking for ways to split a string of a unicode alpha-numeric type to fixed lenghts. for example:


    992000199821376John Smith          20070603

and the array should look like this:

Array (
 [0] => 99,
 [1] => 2,
 [2] => 00019982,
 [3] => 1376,
 [4] => "John Smith",
 [5] => 20070603
) 

array data will be split like this:

    Array[0] - Account type - must be 2 characters long,
    Array[1] - Account status - must be 1 character long,
    Array[2] - Account ID - must be 8 characters long,
    Array[3] - Account settings - must be 4 characters long,
    Array[4] - User Name - must be 20 characters long,
    Array[5] - Join Date - must be 8 characters long.
Tillo answered 13/9, 2012 at 10:36 Comment(2)
Adding a tag gives your question better visibility.Gallo
not possible to do in unicode (only for ascii). ever. See my answer.Marston
L
4

Or if you want to avoid preg:

$string = '992000199821376John Smith          20070603';
$intervals = array(2, 1, 8, 4, 20, 8);

$start = 0;
$parts = array();

foreach ($intervals as $i)
{
   $parts[] = mb_substr($string, $start, $i);

   $start += $i;
}
Lebron answered 13/9, 2012 at 13:19 Comment(4)
Using intervals makes it a lot more maintainable. +1 for this solution.Vaulting
Sorry, doesn't work. Splits by code units in case of unicode, not characters.Marston
How about $parts[] = mb_substr($string, $start, $i, mb_detect_encoding($string)); ?Lebron
@jonnyynnoj: I don't think that using mb_detect_encoding here is reliable. Also one should count grapheme clusters, not codepoints.Melidamelilot
M
0
    $s = '992000199821376Николай Шмидт       20070603';

    if (preg_match('~(.{2})(.{1})(.{8})(.{4})(.{20})(.{8})~u', $s, $match))
    {
        list (, $type, $status, $id, $settings, $name, $date) = $match;
    }
Manufacture answered 13/9, 2012 at 13:3 Comment(0)
G
0

Using the substr function would do this quite easily.

$accountDetails = "992000199821376John Smith          20070603";
$accountArray = array(substr($accountDetails,0,2),substr($accountDetails,2,1),substr($accountDetails,3,8),substr($accountDetails,11,4),substr($accountDetails,15,20),substr($accountDetails,35,8));

Should do the trick, other than that regular expressions (as suggested by akond) is probably the way to go (and more flexible). (Figured this was still valid as an alternate option).

Grandnephew answered 13/9, 2012 at 13:16 Comment(0)
M
0

It is not possible to split a unicode string in a way you ask for.

Not possible without making the parts invalid. Some code points have no way of standing out, for example: שׁ is 2 code points (and 4 bytes in UTF-8 and UTF-16) and you cannot split it because it is undefined.

When you work with unicode, "character" is a very slippery term. There are code points, glyphs, etc. See more at http://www.utf8everywhere.org, the part on "length of a string"

Marston answered 13/9, 2012 at 21:21 Comment(2)
+1. Unicode is great for some things, but complicates string handling tremendously. (Are you sure it's "not possible" though? Perhaps only "very difficult"?)Memorize
Yes, ghoti, what is asked is not possible. I thought I explained why, didn't I?Marston

© 2022 - 2024 — McMap. All rights reserved.