Regular expressions

Learn how to harness the power of regular expressions to safeguard sensitive information

Updated over a week ago

Regular expressions, or "regex", are powerful tools that enable Alerts to be configured with precise pattern matching. In this article, we'll delve into the fundamentals of regular expressions, empowering you to leverage regex patterns effectively to monitor for content that contains sensitive information.

What are Regular Expressions?

Regular expressions are sequences of characters that define search patterns. They allow you to specify complex text patterns you want to match within a larger body of text. These patterns can range from simple strings like credit card numbers or social security numbers to complex structures like email addresses or protected health care information. By crafting regex patterns, Protect administrators can configure Alerts to recognize and take action upon encountering sensitive data.

Basic Syntax:

Regular expressions consist of literal characters and metacharacters, which have special meanings. Here is a brief overview:

  1. Literal Characters: Regular expressions consist of literal characters representing themselves (e.g., "ABC" matches the string "ABC" exactly).

  2. Metacharacters: Metacharacters possess special meanings within regular expressions, allowing for flexible pattern matching. Here are some common metacharacters:

    • .: Matches any single character except newline.

    • ^: Matches the start of a line.

    • $: Matches the end of a line.

    • *: Matches zero or more occurrences of the preceding character.

    • +: Matches one or more occurrences of the preceding character.

    • ?: Matches zero or one occurrence of the preceding character.

    • []: Matches any single character within the brackets.

    • (): Groups characters together.

Examples: Let's look at some examples to understand how regular expressions work:

  • [0-9]+: Matches one or more digits.

  • [A-Za-z]+: Matches one or more alphabetic characters.

  • ^Start: Matches lines that start with the word "Start".

  • end$: Matches lines that end with the word "end".

  • ^Dear\s[A-Za-z]+: Matches lines that start with "Dear" followed by a space and then a name.

Best Practices:

  • Clearly define the types of sensitive data you aim to protect (e.g., credit card numbers, personally identifiable information).

  • Anticipate variations or formats in sensitive data (e.g., different credit card number formats) and incorporate them into your regex patterns using character classes and quantifiers.

  • Test your regex patterns against sample data to ensure they accurately identify sensitive information while minimizing false positives.

  • Start simple with basic patterns and gradually increase the complexity

  • Engage with online communities and forums dedicated to regex, where users share insights, tips, and troubleshooting advice.

Common Use Cases with Examples

  1. Identifying Personally Identifiable Information (PII):

    • Regular expressions can detect patterns corresponding to PII, such as social security numbers, passport numbers, and driver's license numbers. Examples:

      US Social Security Number: \b\d{3}-\d{2}-\d{4}\b
      US Passport: \b[A-Z]{1}[0-9]{8}\b
      US Driver's License (Florida): \b[A-Z][0-9]{3}-[0-9]{3}-[0-9]{2}-[0-9]{3}-[0-9]\b

  2. Recognizing Financial Data:

    • Regular expressions can identify financial data, including credit card numbers, bank account numbers, routing numbers, and financial transaction identifiers. Here are some common regex for detecting credit card numbers:

      Visa: \b4\d{3}(?<g1>[ -]?)\d{4}\k<g1>\d{4}\k<g1>\d{4}\b
      AmEx: (\b(?:3[47]\d{2}(?<gap>[ -]?)\d{6}\k<gap>\d{5})\b)
      Mastercard: \b5[1-5]\d{2}(?<g1>[ -]?)\d{4}\k<g1>\d{4}\k<g1>\d{4}\b

  3. Ensuring Regulatory Compliance:

    • Regular expressions are essential for enforcing compliance with data protection regulations such as GDPR, HIPAA, PCI DSS, and CCPA. Example:

      National Provider Identifier (HIPAA): \b\d{10}\b

  4. Protecting Intellectual Property:

    • Regular expressions can be crafted to detect proprietary information, such as product codes, trade secrets, source code snippets, or confidential documents.

๐Ÿ’ก Need further assistance?

Our team is here to help. Send us a chat message or email [email protected] for assistance setting up regex patterns for your alerts.

Did this answer your question?