
I'm trying to generate the sum of a collection of strings by calcuating each character's byte value into a sum:

$sum = 0;

foreach( $array as $item ) {
    $bytes = unpack( 'C*', $item );

    for( $i = 1; $i <= count( $bytes ); $i++ ) {
        if( isset( $bytes[$i+1] ) ) {
            $sum += $bytes[$i] - $bytes[$i+1];
        } else {
            $sum -= $bytes[$i];

return $sum;

The goal here is to compare past sums with newly generated sums (that is to say, check if there has been a new addition to the collection, suppose Item4) and perform actions if yes.

As such, it's very important that:

  1. The algorithm can compute the sum irrespective of the order of the items.
  2. The algorithm doesn't get confused by a case where let's say Item3 now becomes temI3 (and therefore the sum value is still the same, even though it's clearly not the same Item).

The whole secondary loop is to check against exactly that: loop through each byte (character) from each Item$i and to the final summ, add the differene between the first and second bytes. If there isn't a next one and we are at the end of the string, simply subtract if from the whole sum.

As such, the following:

Input(s): ['Item1', 'Item2', 'Item3'] / ['Item1', 'Item2', 'Item3'] , output: the same int. Where as ['Item1', 'Item2', 'Ite3m'] outputs a different int.

The performance as of now is as follows:

100000 items across 100 runs: 0.18s / 1000000 items across 100 runs: 2.05s

And although I understand that to parse and do these calculations for a million items in 2s is rather fast for PHP, I still think that, if you look at what the thing does, it's still slow.

Any way to speed this up?

  Hmm, have you tried simply not calling count( $bytes ) over and over during each iteration of the nested loop? Try calling it once. (I guess this is a review and can be an answer, but it feels pretty meagar.) Or how about ++$i instead of $i++? Can you decrement instead of increment to somehow avoid isset()? Commented Dec 23, 2019 at 21:30
  Is there a reason that you are not simply using === to compare $bytes arrays? Commented Dec 24, 2019 at 4:48

Dont call functions in loop conditions.

Calling count as condition of a for is a common mistake. It has the same behaviour but it is slower because the count function has to be called repeatedly although it returns the same value all the time. Unless the count changes during iterations, prefetch the count before the loop.

Iterate only where body is same

In your for you always check if the element is the last one, to do something else in that case. You better exclude it from the for, and handle the last element separately.

Unpack is not necesary

I wasnt really sure until I benched it, but just accessing individual characters of the strings with the [] operator and using the ord function to get the numeric value of bytes seems quite faster.

Not sure your algorithm is correct

As I've shown in me2 implementation, you are actualy adding just the first charracter of each word and subracting last character of each word twice. It means that none of the characters except first and last in each word is contributing to the resulting sum. Therefore item Item1 is the same as Ixxx1. (Try for $input2 in the code below and see for yourself).


function op(array $array)
    $sum = 0;

    foreach( $array as $item ) {
        $bytes = unpack( 'C*', $item );

        for( $i = 1; $i <= count( $bytes ); $i++ ) {
            if( isset( $bytes[$i+1] ) ) {
                $sum += $bytes[$i] - $bytes[$i+1];
            } else {
                $sum -= $bytes[$i];

    return $sum;

function op2(array $array)
    $sum = 0;

    foreach( $array as $item ) {
        $bytes = unpack( 'C*', $item );

        $length = count( $bytes );
        for( $i = 1; $i < $length ; $i++ ) {
            $sum += $bytes[$i] - $bytes[$i+1];
        if ($length > 0) {
            $sum -= $bytes[$length];


    return $sum;

function me(array $array)
    $sum = 0;

    foreach( $array as $item ) {
        $length_1 = strlen($item) - 1;
        for ($i = 0; $i < $length_1; ++$i) {
            $sum += ord($item[$i]) - ord($item[$i + 1]);
        if ($length_1 >= 0) {
            $sum -= ord($item[$length_1]);


    return $sum;

function me2(array $array)
    $sum = 0;

    foreach( $array as $item ) {
        $length = strlen($item);
        if ($length == 1) {
            $sum -= ord($item[0]);
        } elseif ($length > 1) {
            $sum += ord($item[0]) - 2 * ord($item[$length-1]);

    return $sum;

function bench(callable $callback, array $input, int $reps = 10000)
    $total = 0;
    for ($i = 0; $i < $reps; ++$i) {
        $start = \microtime(true);
        $total +=  \microtime(true) - $start;
    return $total / $reps;


$input1 = ['Item1', 'Item2', 'Item3', 'X', ''];
$input2 = ['Ixxx1', 'Ixxx2', 'Ixxx3', 'X', ''];

$outOp = op($input1);
$outOp2 = op2($input1);
$outMe = me($input1);
$outMe2 = me2($input1);

echo 'OP: ' . bench('op', $input1);
echo \PHP_EOL;
echo $outOp;
echo \PHP_EOL;

echo 'OP improved: ' . bench('op2', $input1);
echo \PHP_EOL;
echo $outOp2;
echo \PHP_EOL;

echo 'No unpack: ' . bench('me', $input1);
echo \PHP_EOL;
echo $outMe;
echo \PHP_EOL;

echo 'Me2: ' . bench('me2', $input1);
echo \PHP_EOL;
echo $outMe2;
echo \PHP_EOL;
OP: 3.5921573638916E-6
OP improved: 3.3084869384766E-6
No unpack: 1.4432907104492E-6
Me2: 5.5739879608154E-7
  You are right, yours is way faster. Unfortunately, both my implementation, as well as yours suffer from the same issue: it doesn't compute a different sum if the order of characters (and therefore the word itself is different) is different: ['Item1', 'Item2', 'Ietm3'] is the same sum as ['Item1', 'Item2', 'Item3'].
    – Daniel M
    Commented Dec 24, 2019 at 17:56
  I know you showed me that your function performs way faster, but you're comparing me having to go through the whole string, vs. you taking only the first and last characters of each string item. I need to parse each string fully and compute a number that's representative of a string. I need the count to know how long each string I'm parsing is.
    – Daniel M
    Commented Dec 24, 2019 at 18:03
  I went ahead and accepted the answer as, within the scope of the question, which is performance, it does its job if we pair it the output of my function. I would GREATLY appreciate any help with the algorithm, though.
    – Daniel M
    Commented Dec 24, 2019 at 18:21
  return crc32( implode( "", $array) ) across 100000 runs takes 0.00265, not sure it can be improved though.
    – Daniel M
    Commented Dec 24, 2019 at 18:24
  @DanielM Yeah, the "first+last char" version will be more or less effective than yours for strings of different lengths. But it indeed does not satisfy your requirements. But neither does your original code. I included that version exactly to point this out. Anyway, problem here is that no matter how you compute the sum/hash, you are mapping potentialy infinite set of values to a finite range of possible values of int (or string of fixed length). This will unevitably lead to ambiguities. crc32 also has this artifact. If it is acceptable for you, sure go ahead and use that one.
    – slepic
    Commented Dec 24, 2019 at 21:52

