If you administer a Redash instance you may receive requests to provide or remove user data governed by GDPR. In this post we discuss how to search for references to user data with the Redash API.
Types of GDPR Requests
In the context of Redash, Personally Identifiable Information (PII) is usually some form of indicative data linked to an email address. GDPR requests come in two types:
- Requests to see PII retained about a user
- Requests to remove PII retained about a user
Fulfilling the first type of request is straightforward since Redash can easily generate CSV files. Just write queries that accept a User parameter and tag them GDPR so they’re easy to find. When you receive a request for data, run these queries with the relevant User value, and download the resulting CSV to share with the requestor.
The specifics of these queries will depend on your data structure and the kind of information you retain about your users, so we won’t cover it here. But we will cover the second type of common GDPR request: removing PII from your Redash instance.
There are two places in Redash where you may find data governed by GDPR:
- Cached query results
- Metadata like query text, tags, names, or descriptions
Removing PII From Cached Query Results
This happens in two steps.
- Erase the PII from your database
- Update cached query results in Redash containing PII
As before, the first step will depend on the schema of your database. Some organizations write stored procedures to remove PII. Others use visual admining interfaces. The specifics are beyond the scope of this article.
However, it’s easy to clear cached results from Redash once you remove the PII from your database. Just execute the queries again. Doing so forces Redash to overwrite its cache with the latest result, which doesn’t include PII for the given user.
Removing PII From Query Metadata
Cleansing query metadata of PII is even easier. Just update the offending metadata (titles, descriptions, tags, or text widgets) and save your changes. Redash only keeps the latest version of this information so any PII that was previously preserved in this metadata will become irretrievable
For larger Redash installations you may wish to automate this step using Redash’s API. Check out
redash-toolbelt on Github for more information.
Unless you have a good memory it would be tedious to search your Redash instance by hand for references to a user. That’s why we wrote a CLI command in our
redash-toolbelt library to help you find those references.
To get started, you can either clone the
redash-toolbelt repo to your system and run
poetry install or you can
pip install redash-toolbelt from any Python 3 environment. The Requests and Click libraries are also required.
You will need your Redash host address, an API key for one of your organization’s admin group members, and the search term you want to isolate (usually an email address). It’s important to run the script with an administrator’s API key since non-admins might not see every query.
$ gdpr-scrub <host> <search term> --api-key <admin api key>
Or you can run:
python3 -m redash_toolbelt.examples.gdpr_scrub <host> <search term> --api-key <admin api key>
You will see a progress bar advance across your terminal window while the script loops through the queries and dashboards of your Redash instance. It searches:
- Dashboard text widgets
- Query text
- Cached query results
...for the search term you specify. Then it prints the URLs for any query or dashboard that matched your search.
⚠️ Warning: This script does not delete PII ⚠️
It only finds objects that match a search term. Use it as a starting point when removing PII from your system. We recommend running
gdpr-scrub twice for each GDPR removal request: once before removing the PII so you know where to find it. And once afterward to confirm you got it all. To actually delete PII from your system, see the steps above.