Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

WSO2 Enterprise Integrator (WSO2 EI) consists of four profiles (ESB, Message Broker, Business Process Server, and Analytics) that can persist a user's personally identifiable information (PII) in various sources, namely log files and RDBMSs. However, organizations that use WSO2 EI have a legal obligation to remove all instances of a user's PII from the system if the relevant user requests. For example, consider a situation where an employee resigns from the organization and, thereby, requests the organization to remove all instances of one's PII from the organization's system. You can fulfill this requirement by anonymizing the user's PII in the system, or (in some cases) by completely removing such PII from the system.

...

  1. Every log statement follows the same pattern where the "USER_NAME" keyword is followed by an actual username (in this example it is "Sam"). The regex pattern of this log statement will be as shown below. The Forget-Me Tool will use the below regex pattern to anonymize the username. 

    This pattern should be added to the ei-patterns.xml file (stored in the <EI_HOME>/wso2/tools/forget-me/conf/log-config/ directory).

    Code Block
    <pattern key="pattern3">
           <detectPattern>(.)*(USER_NAME)(.)*${username}(.)*</detectPattern>
           <replacePattern>${username}</replacePattern>
    </pattern>
  2. Update the config.json file (stored in the <EI_HOME>/wso2/tools/forget-me/conf/directory) as shown below. This file contains references to all the log files (except any service-specific log file) in the system that store the above user information. If you have enabled a service-specific log file, you need to add that file name (see the element descriptions given below).

    Code Block
    {
     "processors" : [ 
       "log-file"
     ],
     "directories": [
       {
         "dir": "log-config",
         "type": "log-file",
         "processor" : "log-file",
         "log-file-path" : "<EI_HOME>/repository/logs",
         "log-file-name-regex" : "(audit.log|warn.log|wso2carbon.log)(.)*"
       }
     ]
    }

    The elements in the above configuration are explained below.

    • "processors": The processors listed for this element specifies whether the tool will on log files, RDBMSs, or analytics streams. In the case of the ESB profile, we only need to remove PII from log files, and therefore, the processor is set to "log-file".
    • "directories": This element lists the directories that correspond to the processors. In the case of the ESB profile, we need to specify the directories that store log files.
    • "log-file-path": This specifies the directory path to the log files. Note that all the relevant log files are stored in the <EI_HOME>/repository/logs/ directory.

      Note

      Be sure to replace the "log-file-path" value with the correct absolute path to the location where the log files are stored. If you are on Windows, be sure to use the forward slash ("/") instead of the back slash ("\"). For example: C:/Users/Administrator/Desktop/wso2ei-6.2.0/repository/log.

    • "log-file-name-regex": This gives the list of log files (stored in the log-file-path) that will persist the user's PII. Note that the above log-file-name-regex includes the audit.log, warn.log, and wso2carbon.log files, as well as the archived files of the same logs. If you have enabled a service-specific log file, be sure to add the file name to this list.

  3. Open a command prompt and navigate to the <EI_HOME>/bin directory.

  4. Execute the following command to anonymize the user information that was added to the ei-patterns.xml file:


    • On Linux:

      Code Block
      ./forgetme.sh -U Sam
    • On Windows:

      Code Block
      forgetme.bat -U Sam


    This will result in the following:

    1. Copies will be created of all the log files specified in the config.json file. The following is the format of the log copyanon-<time_stamp>-<original_log_name>.log. For exampleanon-1520946791793-warn.log.

    2. The PII will be anonymized in the copies. The log files will display the user information as a pseudonym.

      Code Block
      [EI-Core]  INFO - LogMediator USER_NAME = 86c3bfd9-f97c-4b08-9f15-772dcb0c1c
    Note

    For the list of commands you can run using the Forget-Me tool, see this link.

...

Anonymizing PII in the BPMN (activiti) component

The PII references stored by the BPMN component can be removed from log files as well as the BPMN-specific database by using the Forget-Me Tool.

Follow the steps given below.

  1. Add the relevant drivers for your BPMN-specific database to the <EI_HOME>/wso2/tools/forget-me/lib directory. For example, if you have changed your BPMN database from the default H2 database to MySQL, copy the MySQL driver to this given directory.
  2. Open the activiti-datasources.xml file (stored in the <EI_HOME>/wso2/tools/forget-me/conf/datasources/ directory), and specify the details of the RDBMS that stores the metadata from BPMN workflows.
  3. Update the config.json file ( stored in the <EI_HOME>/wso2/tools/forget-me/conf/ directory) as shown below. This file contains references to all the log files in the system, and the RDBMS that stores the user information form BPMN workflows.

    Code Block
    {
     "processors" : [
       "log-file", "rdbms"
     ],
     "directories": [
       {
         "dir": "log-config",
         "type": "log-file",
         "processor" : "log-file",
         "log-file-path" : "<EI_HOME>/wso2/business-process/repository/logs",
         "log-file-name-regex" : "(audit.log|warn.log|wso2carbon.log)(.)*"
       },
       {
        "dir": "sql",
        "type": "rdbms",
        "processor" : "rdbms"
        }
     ],
     "extensions": [
       {
         "dir": "datasources",
         "type": "datasource",
         "processor" : "rdbms"
       }
     ]
    }

    The elements in the above configuration are explained below.

    • "processors": The processors listed for this element specifies whether the tool will run for log files, RDBMSs, or analytics streams. In the case of the BPMN component of the BPS profile, we need to remove PII from log files, as well as the BPMN-specific database. Therefore, the processor is set to "log-file","rdbms".
    • "directories": This element lists the directories that correspond to the processors. In the case of the BPMN component, we need to specify the directories that store log files, as well as the directory of the SQL scripts for the BPMN database. Therefore, the above configuration contains two directories: "log-config" and "sql".
    • "log-file-path": This specifies the directory path to the logs. In this example, all the relevant log files for BPS are stored in the <EI_HOME>/wso2/business-process/repository/logs/ directory. 

      Note

      Be sure to replace the "log-file-path" value with the correct absolute path to the location where the log files are stored. If you are on Windows, be sure to use the forward slash ("/") instead of the back slash ("\"). For example: C:/Users/Administrator/Desktop/wso2ei-6.2.0/repository/log.

    • "log-file-name-regex": This gives the list of log files (stored in the log-file-path) that will persist the user's PII. Note that the above log-file-name-regex includes the audit.log, warn.log, and wso2carbon.log files, as well as the archived files of the same logs.

  4. Open a command prompt and navigate to the <EI_HOME>/bin directory.

  5. Run the tool using the following command:

    • On Linux:

      Code Block
      ./forgetme.sh -U <USERNAME>
    • On Windows:

      Code Block
      forgetme.bat -U <USERNAME>

    This will result in the following:

    1. Copies will be created of all the log files specified in the config.json file. The following is the format of the log copyanon-<time_stamp>-<original_log_name>.log. For exampleanon-1520946791793-warn.log.

    2. The PII will be anonymized in the copies. The log files will display the user information as a pseudonym.

    3. The user's PII will be removed from the BPMN database.
    Note

    For the list of commands you can run using the Forget-Me tool, see this link.

...

Note that the PII is not removed from the original log files. It is the responsibility of the organization to remove the original log files that contain the user's PII.

Removing Human Task and BPEL process instances

If you are using Human Tasks and BPEL workflows in your BPS profile, you can remove a user's personally identifiable information (PII) from the BPS instance by removing all process instances and task instances (associated with message exchanges) from the server.

WSO2 EI is shipped with a set of SQL scripts (stored in the bpel and humantask folders in the <EI_HOME>/wso2/business-process/repository/resources/cleanup-scripts directory) that you can use for removing process instances and task instances from the BPS profile. There are two ways of doing this:

...

Stream NameAttribute List
org.wso2.gdpr.students
  • username
  • email
  • dateOfBirth
org.wso2.gdpr.students.marks
  • username
  • marks

These PII references can be removed from the Analytics database by using the Forget-Me ToolFollow the steps given below.

  1. Add the relevant drivers for your Analytics-specific databases to the <EI_HOME>/wso2/tools/forget-me/lib directory. For example, if you have changed your Analytics databases from the default H2 instances to MySQL, copy the MySQL driver to this given directory.
  2. Create a folder named 'streams' in the <EI_HOME>/wso2/tools/forget-me/conf/ directory. 
  3. Create a new file named streams.json with the content shown below, and store it in the /streams directory that you created in the previous step. This file holds the details of the streams and the attributes with PII that we need to remove from the database.

    Code Block
    {
        "streams": [
            {
                "streamName": "org.wso2.gdpr.students",
                "attributes": ["username", "email", "dateOfBirth"],
                "id": "username"
            },
            {
                "streamName": "org.wso2.gdpr.students.marks",
                "attributes": ["username"],
                "id": "username"
            }
        ]
    }

    The above configuration includes the following:

    • Stream Name: The name of the stream.
    • Attributes: The list of attributes that contain PII.
    • id: The ID attribute, which holds the value that needs to be anonymized (replaced with a pseudonym).
  4. Update the config.json file ( stored in the <EI_HOME>/wso2/tools/forget-me/conf/ directory) as shown below.

    Code Block
    languagejs
    {
        "processors": [
            "analytics-streams"
        ],
        "directories": [
            {
                "dir": "analytics-streams",
                "type": "analytics-streams",
                "processor": "analytics-streams"
            }
        ]
    }
  5. Open a command prompt and navigate to the <EI_HOME>/bin directory.

  6. Run the tool using the following command:

    • On Linux:

      Code Block
      ./forgetme.sh -U <USERNAME> -carbon <EI_ANALYTICS_HOME>
    • On Windows:

      Code Block
      forgetme.bat -U <USERNAME> -carbon <EI_ANALYTICS_HOME>
    Note

    For the list of commands you can run using the Forget-Me tool, see this link.

Excerpt
hiddentrue

Anonymizing PII of business process analytics

Shown below are the data streams used by the BPS profile of WSO2 EI along with sample attributes with PII references.

Stream NameAttribute List
BPMN_Process_Instance_Data_Publish
  • startUserId
BPMN_Task_Instance_Data_Publish
  • assignee
BPMN_Service_Task_Instance_Data_Publish
  • assignee

These PII references can be removed from the Analytics database by using the Forget-Me Tool. Follow the steps given below.

  1. Add the relevant drivers for your Analytics-specific databases to the <EI_HOME>/wso2/tools/forget-me/lib directory. For example, if you have changed your Analytics databases from the default H2 instances to MySQL, copy the MySQL driver to this given directory.
  2. Create a folder named 'streams' in the <EI_HOME>/wso2/tools/forget-me/conf/ directory. 
  3. Create a new file named streams.json with the content shown below, and store it in the /streams directory that you created in the previous step. This file holds the details of the streams and the attributes with PII that we need to remove from the database.

    Code Block
    {
        "streams": [
    	{
                "streamName": "BPMN_Process_Instance_Data_Publish",
                "attributes": ["startUserId"],
                "id": "username"
            },
    	{
                "streamName": "BPMN_Task_Instance_Data_Publish",
                "attributes": ["assignee"],
                "id": "username"
            },
            {
                "streamName": "BPMN_Service_Task_Instance_Data_Publish",
                "attributes": ["assignee"],
                "id": "username"
            }
        ]
    }

    The above configuration includes the following:

    • Stream Name: The name of the stream.
    • Attributes: The list of attributes that contain PII.
    • id: The ID attribute, which holds the value that needs to be anonymized (replaced with a pseudonym).
  4. Update the config.json file ( stored in the <EI_HOME>/wso2/tools/forget-me/conf/ directory) as shown below.

    Code Block
    languagejs
    {
        "processors": [
            "analytics-streams"
        ],
        "directories": [
            {
                "dir": "streams",
                "type": "analytics-streams",
                "processor": "analytics-streams"
            }
        ]
    }
  5. Open a command prompt and navigate to the <EI_HOME>/bin directory.

  6. Run the tool using the following command:

    Code Block
    ./forgetme.sh -U <USERNAME> -carbon <EI_ANALYTICS_HOME>