Oct 17, 2023 • 4 min read

Revamping Privacy Mode: A Better Way to Obfuscate Sensitive Data

Author picture of Spencer Amarantides
Spencer Amarantides
Software Engineer @ Highlight

Default privacy mode cover

Do you enjoy seeing the code responsible for causing an error on your frontend stack traces? Do As a b2b company with several customers who are quite sensitive about data protection and information security, we built strict privacy mode several months ago. Strict privacy mode will obfuscate all text in the DOM, not differentiating between sensitive and innocuous data. This is irreversible, as the text is removed on client side, and Highlight will never receive this data. For example, Highlight will receive <h1>Hello World</h1> as <h1>1f0eqo jw02d</h1>.

Previously, this mode was all or nothing, and there was little support to intelligently hide your personal information. Today we're excited to announce "default" privacy mode. As the name suggests, this mode will be enabled by default on SDK versions 8.0.0 and later. Default mode offers a smart level of privacy by obfuscating text that matches names and patterns associated with common personal identifiable information.

Default privacy mode example

Default privacy mode relies on regex expressions to identify this data, such as phone numbers, social security numbers, email addresses, and more. This works well for static text, but is not helpful for dynamic text, like inputs. If a user is typing in a social security number, it may not get recognized until the first 8 digits are exposed. To solve this, we search the DOM for inputs with common names, ids, and autocomplete values, to obfuscate the input from the start.

Default privacy mode is a best effort algorithm, but is imperfect in a couple ways. First, it may over obfuscate text that matches one of the regex expressions. In the example above, the UserId matches a long number regex expression, designed to catch phone numbers. We expect this to over-obfuscate certain texts, since there is no context being used to determine the "identifiable" aspect of data. Determining context is difficult to do in real time, but an improvement area we are looking into. Second, text broken up by different elements may not be obfuscated, despite the overall text being recognized by a regex expression. For example, <div>spencer@<b>highlight</b>.io</div> matches the email regex, but is broken up by a bold element. While this algorithm is not flawless, we believe this solution is a great option for companies that want to minimize exposing data without going the nuclear option. It represents a significant step forward in helping you safeguard your customer data, and we are looking forward to improving and building additional privacy options. For any questions or comments, don't hesitate to reach out to us on discord.

Comments (0)
Name
Email
Your Message

Other articles you may like

Optimizing Clickhouse: The Tactics That Worked for Us
Day 3: Metrics & APM
Migrating from OpenSearch to Clickhouse
Try Highlight Today

Get the visibility you need