Data Governance: Daten als Unternehmensasset systematisch verwalten
Data governance is the organizational and technical framework for the secure, compliant, and value-adding management of corporate data. This article explains the data governance framework, roles (data owner, steward, custodian), data classification, data catalog, data quality, lineage, and compliance integration (GDPR, ISO 27001).
Table of Contents (6 sections)
Data is the most valuable asset of modern companies—and at the same time, the greatest source of liability. Without structured data governance, organizations unknowingly violate GDPR requirements, lose track of sensitive material, and—in the event of a security incident—cannot prove who had access to which data.
What is Data Governance?
Data Governance Framework - Overview:
Data Governance:
→ Who is allowed to do what with which data?
→ What data exists and where?
→ How long is data retained?
→ Who is responsible?
Data Governance ≠ Data Management
Data Management: technical storage, backup, performance
Data Governance: policies, roles, compliance, quality
---
Why Data Governance Now?
Regulatory Pressure:
→ GDPR Art. 5: Purpose limitation, data minimization, storage limitation
→ GDPR Art. 30: Record of processing activities (RPA)
→ ISO 27001 A.5.12/5.13: Information classification
→ NIS2 Art. 21: Guidelines for access to network and information systems
→ DORA (financial sector): Data Asset Inventory
Data breaches without governance:
→ Employee doesn’t know: “Can I upload this to SharePoint?”
→ Audit: “Show us all data transfers to the U.S.” → impossible
→ Ransomware: “Which data was encrypted?” → unclear
Typical statements without data governance:
"We don’t know exactly what we have"
"Some developer made a copy"
"That’s from the old department; it doesn’t exist anymore"
Roles in the Data Governance Framework
Data Governance Roles:
1. Chief Data Officer (CDO) / Data Governance Officer:
→ Overall responsibility for the data governance program
→ C-level sponsor (alternatively: CIO or CISO)
→ Strategy, budget, escalation authority
→ In SMEs: often the CISO or IT manager assumes this role
2. Data Owner:
→ Department head, responsible for the data assets of their department
→ Decides: classification level, access permissions, retention
→ Responsibilities:
- Grants access rights (not the IT department!)
- Verifies the accuracy and currency of the data
- Decides on deletion
→ Example: Sales Manager = Data Owner for customer data
3. Data Steward:
→ Subject matter expert, handles daily data maintenance
→ Reports to the Data Owner
→ Responsibilities:
- Ensures data quality (completeness, accuracy, timeliness)
- Maintains metadata in the data catalog
- Reports quality issues to the Data Owner
→ Example: CRM specialist on the sales team
4. Data Custodian (Data Guardian/IT):
→ IT department, technical management of data
→ Responsibilities:
- Backup, encryption, access control (technical)
- Implements decisions made by the Data Owner
- No decision-making authority over content
→ Example: DBA who sets permissions in the database
5. Data Consumer (Data User):
→ Employees who use the data
→ Responsibilities: authorized use only, report quality issues
6. Data Privacy Officer (DPO):
→ GDPR compliance, record of processing activities
→ Advises on the classification of personal data
→ Point of contact for data subject rights
---
Responsibility Matrix (RACI):
Activity | Owner | Steward | Custodian | DPO
Determine classification| R/A | C | - | C
Grant access | A | C | R | -
Ensure data quality | A | R | C | -
Technical encryption | - | - | R/A | -
GDPR compliance check | C | C | C | R/A
Deletion decision | R/A | C | R | C
Data Classification in the Framework
Classification Scheme (4 Levels):
Level 1: PUBLIC
→ Accessible to everyone, no harm if disclosed
→ Examples: Marketing materials, press releases, public price lists
→ Handling: No restrictions
→ Labeling: Not necessary
Level 2: INTERNAL
→ For employees, no public harm if disclosed
→ Examples: internal guidelines, organizational charts, internal reports
→ Handling: Do not share publicly, basic encryption in transit
→ Labeling: "Internal" in document header/footer
Level 3: CONFIDENTIAL
→ Serious harm if disclosed without authorization
→ Examples: Customer data, financial figures, personnel files, contracts
→ Handling:
- Encryption at rest + in transit
- Need-to-know principle
- No forwarding without explicit permission
- Secure deletion
→ Labeling: "Confidential" visible in every document
Level 4: STRICTLY CONFIDENTIAL
→ Catastrophic damage (M&A data, crypto keys, whistleblowers)
→ Examples: Executive board decisions, acquisition plans, security vulnerabilities
→ Handling:
- HSM or separate encrypted storage
- Individually logged access
- Access granted only on a need-to-know basis
- Physical security when printed
→ Label: "Strictly Confidential / For Recipient Only"
---
Microsoft Purview Classification (Automation):
Configure Sensitivity Labels (PowerShell):
Install-Module -Name ExchangeOnlineManagement
Connect-IPPSSession
New-Label -Name "Confidential" `
-DisplayName "Confidential" `
-EncryptionEnabled $true `
-EncryptionEncryptOnly $false `
-EncryptionRightsDefinitions "view@firma.de:VIEW;edit@firma.de:EDIT" `
-ContentMarkingUpHeaderEnabled $true `
-ContentMarkingUpHeaderText "CONFIDENTIAL - FOR INTERNAL USE ONLY"
Auto-Labeling Policy:
New-AutoSensitivityLabelPolicy `
-Name "Auto-Classification-PII" `
-ExchangeLocation All `
-SharePointLocation All `
-Labels "Confidential"
Auto-Labeling Rules (Sensitive Info Types):
New-AutoSensitivityLabelRule `
-Policy "Auto-Classification-PII" `
-Name "IBAN Detection" `
-SensitiveInfoTypes "IBAN"
Result: Emails containing IBANs are automatically classified as "Confidential"!
Data Catalog and Data Lineage
Data Catalog – the "phone book" of all data assets:
What belongs in a data catalog?
For each data asset:
□ Name and unique ID
□ Description (what does this data record contain?)
□ Data Owner (Department + Person)
□ Classification level
□ Storage location (system, path/table)
□ Data format (SQL, CSV, JSON, PDF, etc.)
□ Creation date + Last modified
□ Retention period + Deletion date
□ Personal data? (GDPR-relevant?)
□ Recipients/systems with access
□ Data quality score
Open-source data catalog tools:
Apache Atlas: Enterprise-grade, Hadoop integration
OpenMetadata: Modern, REST API, extensive connectors
DataHub (LinkedIn): Open source, LinkedIn-proven, active community
Amundsen (Lyft): Good for analytics teams
OpenMetadata Connector for PostgreSQL:
# openmetadata-connector.yaml
source:
type: postgres
serviceName: production-db
serviceConnection:
config:
hostPort: postgresql.company.com:5432
username: openmetadata_user
password:<vault-secret>
database: production
sourceConfig:
config:
markDeletedTables: true
includeTables: true
includeViews: true
---
Data Lineage:
Why is it important?
"Where does this data come from?" → Compliance verification
"Which systems use this table?" → Impact analysis during changes
"Who transformed the data?" → Audit trail
Lineage representation:
Source system → ETL process → Data warehouse → Report
CRM (Salesforce) → Fivetran → Snowflake.customers → Tableau dashboard
Automatic lineage via:
→ OpenLineage API standard (Marquez as server)
→ dbt: automatically generates lineage during transformations
→ Apache Airflow: Lineage plugin for DAG-based pipelines
dbt Lineage (example):
-- models/customers_enriched.sql
-- Lineage: raw.salesforce_contacts → int.customers → customers_enriched
SELECT
c.id,
c.email,
o.order_count
FROM {{ ref('int_customers') }} c
LEFT JOIN {{ ref('int_orders') }} o ON c.id = o.customer_id
Data Quality and Retention
Data Quality Dimensions (DAMA):
1. Completeness: Are required fields filled in?
SQL Check: SELECT COUNT(*) FROM customers WHERE email IS NULL;
2. Accuracy: Does the data match reality?
Validation: Email format, ZIP code pattern, phone number format
3. Consistency: Inconsistencies between systems?
CRM: Customer "Mustermann, Max" | ERP: Customer "Max Mustermann GmbH"
→ Golden Record process required
4. Timeliness: Is the data up to date?
Outdated supplier addresses, deleted employees still in systems
5. Uniqueness: Duplicates?
SELECT email, COUNT(*) FROM customers GROUP BY email HAVING COUNT(*) > 1
6. Compliance: Does the data comply with defined standards?
ISO country codes, IBAN format, internal nomenclatures
---
Data Retention (Retention Periods):
Legal Retention Periods (Germany):
Commercial documents (HGB §257): 10 years
Accounting documents, annual financial statements: 10 years
Business correspondence (incoming and outgoing): 6 years
Payroll records: 6 years
Job application documents (rejected): 6 months (GDPR)
Personnel files (after termination): 3 years (statute of limitations)
Credit card data (PCI DSS): no longer than necessary
CCTV recordings: 72 hours (BSI recommendation)
IP addresses (security logs): 7 days (ECJ guidelines)
Audit logs for ISMS: 1–3 years
Retention Policy in SharePoint (PowerShell):
New-RetentionCompliancePolicy `
-Name "Financial Data 10 Years" `
-SharePointLocation "https://firma.sharepoint.com/sites/finanzen" `
-RetentionDuration 3650 `
-RetentionAction KeepAndDelete
After 3650 days: Document moves to "Preservation Hold Library" → after review: Delete
GDPR Integration
Data Governance + GDPR (Practical):
Record of Processing Activities (Art. 30):
Data catalog entry becomes a RPA entry:
□ Name of processing: "Customer Invoices"
□ Purpose: Accounting, Tax Law
□ Categories of data subjects: Private customers
□ Categories of personal data: Name, address, bank details
□ Recipients: Tax advisor, tax office
□ Transfer to third countries: no (or: yes, AWS EU-West-1 Frankfurt)
□ Retention period: 10 years (German Commercial Code)
□ Technical measures: AES-256 encryption, access via IAM
Data subject rights (Art. 15–22) – Data governance makes it possible:
Right of access (Art. 15): Without a data catalog: hours of searching. With a catalog: minutes
Right to erasure (Art. 17): Where is the data located? The catalog provides the answer
Right to data portability (Art. 20): Structured export thanks to data model documentation
Tools for GDPR Data Governance Integration:
OneTrust: Market leader, covers VVT + GDPR + Cookies
DataGrail: Specializes in data subject rights automation
Privacera: Open-source-based (Apache Ranger), cloud-native
Collibra: Enterprise Data Catalog + Privacy Module
---
Practical Implementation Roadmap:
Phase 1 (Months 1–2): Assessment
□ High-level inventory of all data storage locations (80/20 rule: 80% in the main system)
□ Identify critical data sets (customer data, financial data)
□ Designate data owners for each department
□ Define and communicate classification scheme
Phase 2 (Months 3–4): Foundation
□ Implement data catalog tool (OpenMetadata, DataHub, or Microsoft Purview)
□ Classify critical data sets
□ Configure retention policies in the main systems
□ Derive GDPR retention periods from the data catalog
Phase 3 (Months 5–6): Automation
□ Auto-classification via Microsoft Purview / DLP rules
□ Data lineage for critical business processes
□ Data quality monitoring (alerts for quality issues)
□ Self-service access process (data owner must approve access)
Phase 4: Ongoing
□ Quarterly: Review classifications
□ Annually: Update data governance policy
□ Upon system changes: Update data catalog
```</vault-secret> Questions about this topic?
Our experts advise you free of charge and without obligation.
About the Author
Dipl.-Math. (WWU Münster) und Promovend am Promotionskolleg NRW (Hochschule Rhein-Waal) mit Forschungsschwerpunkt Phishing-Awareness, Behavioral Security und Nudging in der IT-Sicherheit. Verantwortet den Aufbau und die Pflege von ISMS, leitet interne Audits nach ISO/IEC 27001:2022 und berät als externer ISB in KRITIS-Branchen. Lehrbeauftragter für Communication Security an der Hochschule Rhein-Waal und NIS2-Schulungsleiter bei der isits AG.
3 Publikationen
- Different Seas, Different Phishes — Large-Scale Analysis of Phishing Simulations Across Different Industries (2025)
- Self-promotion with a Chance of Warnings: Exploring Cybersecurity Communication Among Government Institutions on LinkedIn (2024)
- Exploring the Effects of Cybersecurity Awareness and Decision-Making Under Risk (2024)