If you administer a Redash instance you may receive requests to provide or remove user data governed by GDPR. In this post we discuss the process of handling such requests and introduce a script to help you find references to user data across your installation.
Types of GDPR Requests
In the context of Redash, Personally Identifiable Information (PII) is usually some form of indicative data linked to an email address. GDPR requests come in two types:
- Requests to see PII retained about a user
- Requests to remove PII retained about a user
Fulfilling the first type of request is straightforward since Redash can easily generate CSV files. Just write queries that accept a User parameter and tag them GDPR so they’re easy to find. When you receive a request for data, run these queries with the relevant User value, and download the resulting CSV to share with the requestor.
The specifics of these queries will depend on your data structure and the kind of information you retain about your users, so we won’t cover it here. But we will cover the second type of common GDPR request: removing PII from your Redash instance.
There are two places in Redash where you may find data governed by GDPR:
- Cached query results
- Metadata like query text, tags, names, or descriptions
Removing PII From Cached Query Results
This happens in two steps.
- Erase the PII from your database
- Update cached query results in Redash containing PII
As before, the first step will depend on the schema of your database. Some organizations write stored procedures to remove PII. Others use visual admining interfaces. The specifics are beyond the scope of this article.
However, it’s easy to clear cached results from Redash once you remove the PII from your database. Just execute the queries again. Doing so forces Redash to cache the latest result which doesn’t include PII for the given user.
Removing PII From Query Metadata
Cleansing query metadata of PII is even easier. Just update the offending metadata (titles, descriptions, tags, or text widgets) and save your changes. Redash only keeps the latest version of this information so any PII that was previously preserved in this metadata will become irretrievable
For larger Redash installations you may wish to automate this step using Redash’s API. Check out
redash.py for more information.
Unless you have a good memory it can be tedious going through each query in Redash manually looking for references to a given user. That’s why we wrote a script to find them for you.
This gist has the two files you’ll need. You must install the Requests and Click libraries to run them:
redash.pyis a general purpose wrapper around our API.
gdpr_scrub.pywraps the API with an object that searches your instance. It depends on
redash.pyso they should be saved beside one another in your file system.
You will need your Redash host address, an API key for one of your organization’s admin group members, and the search term you want to isolate (usually an email address). It’s important to run the script with an administrator’s API key since non-admins might not see every query.
Download the files from the gist to a computer with Python 3, Click, and Requests installed. Then run the following command:
$ python3 gdpr_scrub.py <host> <search term> --api-key <admin api key>
You will see a progress bar advance across your terminal window while the script loops through the queries and dashboards of your Redash instance. It searches:
- Dashboard text widgets
- Query text
- Cached query results
...for the search term you specify. Then it prints the URLs for any query or dashboard that matched your search.
⚠️ Warning: This script does not delete PII ⚠️
It only finds objects that match a search term. Use it as a starting point when removing PII from your system. We recommend running
gdpr_scrub.py twice for each GDPR removal request: once before removing the PII so you know where to find it. And once afterward to confirm you got it all. To actually delete PII from your system, see the steps above.