2

The purpose of the PHP / MySQL script that I am working on is to check for duplicate entries exist for new users when they register. If a duplicate exists, warn the users, etc. Normally this would be a simple database query comparing the required columns with the information submitted. However, since we must meet HIPAA compliance, the data (as well as the questions that data corresponds to) is encrypted in the database. So this means we have to pull all the records and then decrypt it before we can begin comparing.

I'm pulling three columns from the database - participants, demographics and data.

Participants is a an integer; the user id. Demographics is also an integer, which is a reference to another table. That table contains questions. The Demographics column is the id of those questions, with values ranging from 237 - 280. Data is the actual data that corresponds to the question id in Demographics.

Each participant has four questions from demographics that they must answer. I only need to pull ID's number 237, 238, 239 and 241 from the demographics column. An example of the query being used to fetch the records (using Doctorine ORM):

        select('(dd.participant_id)', '(dd.demographic_id)', '(dd.demographic_data_value)')
        ->where('dd.demographic_id IN(237,238,239,241)')
        ->orderBy('dd.participant_id', 'ASC')
        ->getQuery()->getResult();

And here's what the result looks like from the above query for a single participant:

[0]=>

    array(3) {
      [1]=>string(1) "5"          /* The participant ID */
      [2]=>string(3) "**237**"    /* The demographic ID */
      [3]=>string(49) "0zW/5aNei6EX6X0abn3smQ==:J2Kl0Ky/L7fjl5m69W0BNA=="      /* The data */
    }

[1]=>

    array(3) {
      [1]=>string(1) "5"          /* The participant ID */
      [2]=>string(3) "**238**"    /* The demographic ID */
      [3]=>string(49) "e97Rse2b+MPgpKV0vuyv2g==:2HcKOVTSG/LghDnoHWQxMw=="      /* The data */
    }

[2]=>

    array(3) {
      [1]=>string(1) "5"          /* The participant ID */
      [2]=>string(3) "**239**"    /* The demographic ID */
      [3]=>string(49) "Ym9EKLAkj5LznrwYNfC5Kw==:gmYziJKA3F+7HJ7hz9IAAQ=="      /* The data */
    }

[3]=>

      array(3) {
      [1]=>string(1) "5"          /* The participant ID */
      [2]=>string(3) "**241**"    /* The demographic ID */
      [3]=>string(3) "654"        /* The data */
    }

Using the example above, you can see we have four separate arrays. Array index [1] is the participant id, index [2] is demographics (the question being asked) and index [3] is the answer to that question. You can tell that this is all for the same participant since index [1] is "5" for each array. As you can see, the data column is encrypted. The actual question is encrypted as well, but not the id for the question. With those columns being encrypted, I can't simply do a read/write. That's a nuisance, but not the problem.

The Problem

The problem I am facing is that there's a few users that did not answer all four questions. If a question is not answered, it is not stored as null or empty in the database as you might expect; it simply is omitted from the table for that user altogether. As a result, an unanswered question results in a PHP error, because we are trying to access an array key that does not exist.

The Proposed Solution

What I would like to do is before iterating over the database results, I would like to first fix the data we already have and find any and all users who did not answer all four questions for demographics IDs 237, 238, 239 and 241, and either remove them from the result set altogether, OR another acceptable solution would be to insert a new record for that participant with the missing demographic ID and the string 'Not Answered' for the Data column.

I'm just stuck on how to actually craft the SQL (or PHP) required to accomplish the task. Please pass along any suggestions or thoughts you may have. Thank you!

1
  • "I would like to first fix the data". If you want to “fix” the data, that is a database problem, with which SQL can be used. However, that also implies the logic to the insertion is “fixed”, too. Otherwise you can, on the fly, “fix” it each time you query. Otherwise, a join is probably what you are looking for, with null at a certain point representing the missing data.
    – Chris Haas
    Commented May 21, 2022 at 3:42

0