Implementation Guide : Data Quality Concepts and Metrics for LLM Models

Introduction

Large Language Models (LLMs) like GPT-4 have become integral in various applications, but their performance heavily relies on the quality of data they are trained on and evaluated with. This blog discusses key data quality concepts such as data leakage and toxicity, and explains the metrics used to quantify them. We’ll provide code snippets and examples to help you understand and implement these metrics in your own projects.

Related page: Understanding Data Quality Concepts and Metrics for LLM Models ; Data Leakage: Data Quality Concepts;

Understanding Data Quality

Data Leakage

Data leakage occurs when sensitive information inadvertently gets included in the training or test data, leading to privacy violations and skewed model performance .

Toxicity

Toxicity refers to harmful or offensive content that can be explicit (clearly abusive language) or implicit (subtle derogatory remarks). Ensuring LLMs do not produce or propagate toxic content is crucial for ethical AI development .

Metrics for Data Quality

Quantifying Data Leakage

Percentage of Prompts with Sensitive Information: Calculate the ratio of prompts containing sensitive data to the total number of prompts.
Frequency of Specific Patterns: Count occurrences of sensitive patterns (e.g., email addresses, passwords) in the dataset.
Number of Detected Entities: Use entity recognition to count sensitive entities identified in the data .

Quantifying Toxicity

Number of Toxic Responses: Count responses flagged as toxic.
Severity Score: Assign and average a severity score to detected toxic content using predefined criteria .

Implementing Metrics

Pattern Matching for Data Leakage

import re
import pandas as pd

data = {
    “prompt”: [
        “My secret code is 12345”, “Contact me at example@example.com”, “The password is qwerty”,
        “You are an idiot”, “I hate you”, “You are amazing”,
        “Please transfer $1000 to account number 123-456-789”, “Call me at 555-1234”, 
        “Your credit card number is 4111 1111 1111 1111”, “Meet me at the park”, 
        “Send the file to secret@company.com”, “The key is 987654”, 
        “My email is user@domain.com”, “Your account balance is low”, 
        “Reset your password to strongpassword”, “You are a fool”, 
        “I despise you”, “You are wonderful”, “My social security number is 987-65-4321”, 
        “Dial 123-456-7890 for support”, “Here is my secret token: 67890”, 
        “Access the system using pass123”, “You are stupid”, “I dislike you”, 
        “You are brilliant”, “Please send the report to admin@service.com”, 
        “My confidential pin is 2222”, “My phone number is 888-555-1212”, 
        “You are an excellent person”, “I hate your guts”, “Your session ID is abc123”, 
        “Contact support at help@company.com”, “You are dumb”, “I don’t like you”, 
        “You are fantastic”, “Here’s my credit card info: 1234 5678 8765 4321”, 
        “Mail the document to private@domain.com”, “The password is letmein”, 
        “You are an imbecile”, “I abhor you”, “You are incredible”, 
        “My banking info is secret123”, “Call 123-456-7890”, “Your pin code is 4567”, 
        “Please email me at work@place.com”, “You are a jerk”, “I detest you”, 
        “You are a genius”, “My credit card number is 1234 5678 1234 5678”
    ]
}

df = pd.DataFrame(data)

# Define regex patterns for detecting sensitive information
patterns = {
    “email”: r”[a-zA-Z0-9+_.-]+@[a-zA-Z0-9.-]+”,
    “password”: r”\b(password|pass|pwd)\b\s*\w+”,
    “credit_card”: r”\b(?:\d[ -]*?){13,16}\b”,
    “phone”: r”\b\d{3}[-.\s]?\d{3}[-.\s]?\d{4}\b”,
    “ssn”: r”\b\d{3}-\d{2}-\d{4}\b”
}

# Function to detect patterns in text
def detect_patterns(text, patterns):
    detections = {}
    for label, pattern in patterns.items():
        if re.search(pattern, text):
            detections[label] = re.findall(pattern, text)
    return detections

# Apply pattern detection to the dataset
df[‘leakage_detections’] = df[‘prompt’].apply(lambda x: detect_patterns(x, patterns))
df.head()

import re
import pandas as pd

data = {
    "prompt": [
        "My secret code is 12345", "Contact me at example@example.com", "The password is qwerty",
        "You are an idiot", "I hate you", "You are amazing",
        "Please transfer $1000 to account number 123-456-789", "Call me at 555-1234", 
        "Your credit card number is 4111 1111 1111 1111", "Meet me at the park", 
        "Send the file to secret@company.com", "The key is 987654", 
        "My email is user@domain.com", "Your account balance is low", 
        "Reset your password to strongpassword", "You are a fool", 
        "I despise you", "You are wonderful", "My social security number is 987-65-4321", 
        "Dial 123-456-7890 for support", "Here is my secret token: 67890", 
        "Access the system using pass123", "You are stupid", "I dislike you", 
        "You are brilliant", "Please send the report to admin@service.com", 
        "My confidential pin is 2222", "My phone number is 888-555-1212", 
        "You are an excellent person", "I hate your guts", "Your session ID is abc123", 
        "Contact support at help@company.com", "You are dumb", "I don't like you", 
        "You are fantastic", "Here's my credit card info: 1234 5678 8765 4321", 
        "Mail the document to private@domain.com", "The password is letmein", 
        "You are an imbecile", "I abhor you", "You are incredible", 
        "My banking info is secret123", "Call 123-456-7890", "Your pin code is 4567", 
        "Please email me at work@place.com", "You are a jerk", "I detest you", 
        "You are a genius", "My credit card number is 1234 5678 1234 5678"
    ]
}

df = pd.DataFrame(data)

# Define regex patterns for detecting sensitive information
patterns = {
    "email": r"[a-zA-Z0-9+_.-]+@[a-zA-Z0-9.-]+",
    "password": r"\b(password|pass|pwd)\b\s*\w+",
    "credit_card": r"\b(?:\d[ -]*?){13,16}\b",
    "phone": r"\b\d{3}[-.\s]?\d{3}[-.\s]?\d{4}\b",
    "ssn": r"\b\d{3}-\d{2}-\d{4}\b"
}

# Function to detect patterns in text
def detect_patterns(text, patterns):
    detections = {}
    for label, pattern in patterns.items():
        if re.search(pattern, text):
            detections[label] = re.findall(pattern, text)
    return detections

# Apply pattern detection to the dataset
df['leakage_detections'] = df['prompt'].apply(lambda x: detect_patterns(x, patterns))
df.head()

This code defines regex patterns to detect sensitive information such as emails, passwords, credit card numbers, phone numbers, and social security numbers in the prompts. It applies these patterns to each prompt and stores the detections.

Entity Recognition for Toxicity Detection

import spacy

# Load spaCy model
nlp = spacy.load("en_core_web_sm")

# Function to detect entities in text
def detect_entities(text, nlp):
    doc = nlp(text)
    entities = [(ent.text, ent.label_) for ent in doc.ents]
    return entities

# Apply entity detection to the dataset
df['entity_detections'] = df['prompt'].apply(lambda x: detect_entities(x, nlp))
df.head()

This code uses spaCy to detect entities in the prompts. Entity recognition can help identify sensitive information and context in the text.

Handling False Positives in Entity Recognition

False positives occur when the entity recognition model incorrectly identifies non-sensitive information as sensitive. To handle false positives:

Review and Validate: Manually review a subset of detected entities to validate their accuracy.
Improve Model: Use a more advanced or fine-tuned entity recognition model to reduce false positives.
Add Contextual Rules: Implement additional rules to filter out false positives based on the context of the detected entities .

Exercises

Test Data Set

data = {
    "prompt": [
        # ... (same data as above)
    ]
}

df = pd.DataFrame(data)

Solution Implementation

# Detect patterns and entities
df['leakage_detections'] = df['prompt'].apply(lambda x: detect_patterns(x, patterns))
df['entity_detections'] = df['prompt'].apply(lambda x: detect_entities(x, nlp))

# Calculate metrics
num_prompts = len(df)
num_prompts_with_leakage = df['leakage_detections'].apply(lambda x: len(x) > 0).sum()
percentage_with_leakage = (num_prompts_with_leakage / num_prompts) * 100

pattern_counts = {label: df['leakage_detections'].apply(lambda x: len(x.get(label, []))).sum() for label in patterns.keys()}

df['num_entities'] = df['entity_detections'].apply(lambda x: len(x))
total_entities_detected = df['num_entities'].sum()

toxic_phrases = ["idiot", "hate", "fool", "despise", "stupid", "dislike", "dumb", "imbecile", "abhor", "jerk", "detest"]
df['is_toxic'] = df['prompt'].apply(lambda x: any(phrase in x.lower() for phrase in toxic_phrases))
num_toxic_responses = df['is_toxic'].sum()

severity_scores = {"idiot": 3, "hate": 3, "fool": 2, "despise": 3, "stupid": 2, "dislike": 1, "dumb": 2, "imbecile": 3, "abhor": 3, "jerk": 1, "detest": 3}
df['severity_score'] = df['prompt'].apply(lambda x: sum(severity_scores[phrase] for phrase in severity_scores.keys() if phrase in x.lower()))
average_severity_score = df['severity_score'].mean()

# Display metrics
print(f"Percentage of Prompts with Sensitive Information: {percentage_with_leakage}%")
print(f"Total Number of Detected Entities: {total_entities_detected}")
print(f"Frequency of Specific Patterns: {pattern_counts}")
print(f"Number of Toxic Responses: {num_toxic_responses}")
print(f"Average Severity Score: {average_severity_score}")

df[['prompt', 'leakage_detections', 'entity_detections', 'is_toxic', 'severity_score']]

Conclusion

From the analysis:

Acceptable Results:
- Low percentage of prompts with sensitive information.
- High number of correctly identified sensitive entities.
- Low number of toxic responses and severity scores suggests effective toxicity detection and mitigation.
Unacceptable Results:
- High percentage of prompts with sensitive information indicates potential data leakage issues.
- Low number of correctly identified toxic entities may require model improvement.
- High number of toxic responses and severity scores indicates a need for better toxicity filtering.