Skip to content

Services, Wiki-Artikel, Blog-Beiträge und Glossar-Einträge durchsuchen

↑↓NavigierenEnterÖffnenESCSchließen

Data Governance: Daten als Unternehmensasset systematisch verwalten

Data governance is the organizational and technical framework for the secure, compliant, and value-adding management of corporate data. This article explains the data governance framework, roles (data owner, steward, custodian), data classification, data catalog, data quality, lineage, and compliance integration (GDPR, ISO 27001).

Table of Contents (6 sections)

Data is the most valuable asset of modern companies—and at the same time, the greatest source of liability. Without structured data governance, organizations unknowingly violate GDPR requirements, lose track of sensitive material, and—in the event of a security incident—cannot prove who had access to which data.

What is Data Governance?

Data Governance Framework - Overview:

Data Governance:
  → Who is allowed to do what with which data?
  → What data exists and where?
  → How long is data retained?
  → Who is responsible?

Data Governance ≠ Data Management
  Data Management:  technical storage, backup, performance
  Data Governance:  policies, roles, compliance, quality

---

Why Data Governance Now?

Regulatory Pressure:
  → GDPR Art. 5: Purpose limitation, data minimization, storage limitation
  → GDPR Art. 30: Record of processing activities (RPA)
  → ISO 27001 A.5.12/5.13: Information classification
  → NIS2 Art. 21: Guidelines for access to network and information systems
  → DORA (financial sector): Data Asset Inventory

Data breaches without governance:
  → Employee doesn’t know: “Can I upload this to SharePoint?”
  → Audit: “Show us all data transfers to the U.S.” → impossible
  → Ransomware: “Which data was encrypted?” → unclear

Typical statements without data governance:
  "We don’t know exactly what we have"
  "Some developer made a copy"
  "That’s from the old department; it doesn’t exist anymore"

Roles in the Data Governance Framework

Data Governance Roles:

1. Chief Data Officer (CDO) / Data Governance Officer:
   → Overall responsibility for the data governance program
   → C-level sponsor (alternatively: CIO or CISO)
   → Strategy, budget, escalation authority
   → In SMEs: often the CISO or IT manager assumes this role

2. Data Owner:
   → Department head, responsible for the data assets of their department
   → Decides: classification level, access permissions, retention
   → Responsibilities:
     - Grants access rights (not the IT department!)
     - Verifies the accuracy and currency of the data
     - Decides on deletion
   → Example: Sales Manager = Data Owner for customer data

3. Data Steward:
   → Subject matter expert, handles daily data maintenance
   → Reports to the Data Owner
   → Responsibilities:
     - Ensures data quality (completeness, accuracy, timeliness)
     - Maintains metadata in the data catalog
     - Reports quality issues to the Data Owner
   → Example: CRM specialist on the sales team

4. Data Custodian (Data Guardian/IT):
   → IT department, technical management of data
   → Responsibilities:
     - Backup, encryption, access control (technical)
     - Implements decisions made by the Data Owner
     - No decision-making authority over content
   → Example: DBA who sets permissions in the database

5. Data Consumer (Data User):
   → Employees who use the data
   → Responsibilities: authorized use only, report quality issues

6. Data Privacy Officer (DPO):
   → GDPR compliance, record of processing activities
   → Advises on the classification of personal data
   → Point of contact for data subject rights

---

Responsibility Matrix (RACI):
  Activity                | Owner | Steward | Custodian | DPO
  Determine classification|  R/A  |    C    |     -     |  C
  Grant access            |  A    |    C    |     R     |  -
  Ensure data quality    |  A    |    R    |     C     |  -
  Technical encryption   |  -    |    -    |    R/A    |  -
  GDPR compliance check            |  C    |    C    |     C     | R/A
  Deletion decision        |  R/A  |    C    |     R     |  C

Data Classification in the Framework

Classification Scheme (4 Levels):

Level 1: PUBLIC
  → Accessible to everyone, no harm if disclosed
  → Examples: Marketing materials, press releases, public price lists
  → Handling: No restrictions
  → Labeling: Not necessary

Level 2: INTERNAL
  → For employees, no public harm if disclosed
  → Examples: internal guidelines, organizational charts, internal reports
  → Handling: Do not share publicly, basic encryption in transit
  → Labeling: "Internal" in document header/footer

Level 3: CONFIDENTIAL
  → Serious harm if disclosed without authorization
  → Examples: Customer data, financial figures, personnel files, contracts
  → Handling:
    - Encryption at rest + in transit
    - Need-to-know principle
    - No forwarding without explicit permission
    - Secure deletion
  → Labeling: "Confidential" visible in every document

Level 4: STRICTLY CONFIDENTIAL
  → Catastrophic damage (M&A data, crypto keys, whistleblowers)
  → Examples: Executive board decisions, acquisition plans, security vulnerabilities
  → Handling:
    - HSM or separate encrypted storage
    - Individually logged access
    - Access granted only on a need-to-know basis
    - Physical security when printed
  → Label: "Strictly Confidential / For Recipient Only"

---

Microsoft Purview Classification (Automation):

Configure Sensitivity Labels (PowerShell):
  Install-Module -Name ExchangeOnlineManagement
  Connect-IPPSSession

  New-Label -Name "Confidential" `
    -DisplayName "Confidential" `
    -EncryptionEnabled $true `
    -EncryptionEncryptOnly $false `
    -EncryptionRightsDefinitions "view@firma.de:VIEW;edit@firma.de:EDIT" `
    -ContentMarkingUpHeaderEnabled $true `
    -ContentMarkingUpHeaderText "CONFIDENTIAL - FOR INTERNAL USE ONLY"

  Auto-Labeling Policy:
  New-AutoSensitivityLabelPolicy `
    -Name "Auto-Classification-PII" `
    -ExchangeLocation All `
    -SharePointLocation All `
    -Labels "Confidential"

  Auto-Labeling Rules (Sensitive Info Types):
  New-AutoSensitivityLabelRule `
    -Policy "Auto-Classification-PII" `
    -Name "IBAN Detection" `
    -SensitiveInfoTypes "IBAN"

  Result: Emails containing IBANs are automatically classified as "Confidential"!

Data Catalog and Data Lineage

Data Catalog – the "phone book" of all data assets:

What belongs in a data catalog?
  For each data asset:
  □ Name and unique ID
  □ Description (what does this data record contain?)
  □ Data Owner (Department + Person)
  □ Classification level
  □ Storage location (system, path/table)
  □ Data format (SQL, CSV, JSON, PDF, etc.)
  □ Creation date + Last modified
  □ Retention period + Deletion date
  □ Personal data? (GDPR-relevant?)
  □ Recipients/systems with access
  □ Data quality score

Open-source data catalog tools:
  Apache Atlas:      Enterprise-grade, Hadoop integration
  OpenMetadata:      Modern, REST API, extensive connectors
  DataHub (LinkedIn): Open source, LinkedIn-proven, active community
  Amundsen (Lyft):   Good for analytics teams

  OpenMetadata Connector for PostgreSQL:
  # openmetadata-connector.yaml
  source:
    type: postgres
    serviceName: production-db
    serviceConnection:
      config:
        hostPort: postgresql.company.com:5432
        username: openmetadata_user
        password:<vault-secret>
        database: production
    sourceConfig:
      config:
        markDeletedTables: true
        includeTables: true
        includeViews: true

---

Data Lineage:

Why is it important?
  &quot;Where does this data come from?&quot; → Compliance verification
  &quot;Which systems use this table?&quot; → Impact analysis during changes
  &quot;Who transformed the data?&quot; → Audit trail

Lineage representation:
  Source system  →  ETL process  →  Data warehouse  →  Report
  CRM (Salesforce) → Fivetran → Snowflake.customers → Tableau dashboard

  Automatic lineage via:
  → OpenLineage API standard (Marquez as server)
  → dbt: automatically generates lineage during transformations
  → Apache Airflow: Lineage plugin for DAG-based pipelines

  dbt Lineage (example):
  -- models/customers_enriched.sql
  -- Lineage: raw.salesforce_contacts → int.customers → customers_enriched
  SELECT
    c.id,
    c.email,
    o.order_count
  FROM {{ ref(&#x27;int_customers&#x27;) }} c
  LEFT JOIN {{ ref(&#x27;int_orders&#x27;) }} o ON c.id = o.customer_id

Data Quality and Retention

Data Quality Dimensions (DAMA):

  1. Completeness: Are required fields filled in?
     SQL Check: SELECT COUNT(*) FROM customers WHERE email IS NULL;

  2. Accuracy: Does the data match reality?
     Validation: Email format, ZIP code pattern, phone number format

  3. Consistency: Inconsistencies between systems?
     CRM: Customer &quot;Mustermann, Max&quot; | ERP: Customer &quot;Max Mustermann GmbH&quot;
     → Golden Record process required

  4. Timeliness: Is the data up to date?
     Outdated supplier addresses, deleted employees still in systems

  5. Uniqueness: Duplicates?
     SELECT email, COUNT(*) FROM customers GROUP BY email HAVING COUNT(*) &gt; 1

  6. Compliance: Does the data comply with defined standards?
     ISO country codes, IBAN format, internal nomenclatures

---

Data Retention (Retention Periods):

Legal Retention Periods (Germany):
  Commercial documents (HGB §257):    10 years
  Accounting documents, annual financial statements:          10 years
  Business correspondence (incoming and outgoing):         6 years
  Payroll records:                             6 years
  Job application documents (rejected):           6 months (GDPR)
  Personnel files (after termination):              3 years (statute of limitations)
  Credit card data (PCI DSS):               no longer than necessary
  CCTV recordings:                            72 hours (BSI recommendation)
  IP addresses (security logs):             7 days (ECJ guidelines)
  Audit logs for ISMS:                       1–3 years

Retention Policy in SharePoint (PowerShell):
  New-RetentionCompliancePolicy `
    -Name &quot;Financial Data 10 Years&quot; `
    -SharePointLocation &quot;https://firma.sharepoint.com/sites/finanzen&quot; `
    -RetentionDuration 3650 `
    -RetentionAction KeepAndDelete

  After 3650 days: Document moves to &quot;Preservation Hold Library&quot; → after review: Delete

GDPR Integration

Data Governance + GDPR (Practical):

Record of Processing Activities (Art. 30):

Data catalog entry becomes a RPA entry:
  □ Name of processing: &quot;Customer Invoices&quot;
  □ Purpose: Accounting, Tax Law
  □ Categories of data subjects: Private customers
  □ Categories of personal data: Name, address, bank details
  □ Recipients: Tax advisor, tax office
  □ Transfer to third countries: no (or: yes, AWS EU-West-1 Frankfurt)
  □ Retention period: 10 years (German Commercial Code)
  □ Technical measures: AES-256 encryption, access via IAM

Data subject rights (Art. 15–22) – Data governance makes it possible:
  Right of access (Art. 15): Without a data catalog: hours of searching. With a catalog: minutes
  Right to erasure (Art. 17): Where is the data located? The catalog provides the answer
  Right to data portability (Art. 20): Structured export thanks to data model documentation

Tools for GDPR Data Governance Integration:
  OneTrust:         Market leader, covers VVT + GDPR + Cookies
  DataGrail:        Specializes in data subject rights automation
  Privacera:        Open-source-based (Apache Ranger), cloud-native
  Collibra:         Enterprise Data Catalog + Privacy Module

---

Practical Implementation Roadmap:

Phase 1 (Months 1–2): Assessment
  □ High-level inventory of all data storage locations (80/20 rule: 80% in the main system)
  □ Identify critical data sets (customer data, financial data)
  □ Designate data owners for each department
  □ Define and communicate classification scheme

Phase 2 (Months 3–4): Foundation
  □ Implement data catalog tool (OpenMetadata, DataHub, or Microsoft Purview)
  □ Classify critical data sets
  □ Configure retention policies in the main systems
  □ Derive GDPR retention periods from the data catalog

Phase 3 (Months 5–6): Automation
  □ Auto-classification via Microsoft Purview / DLP rules
  □ Data lineage for critical business processes
  □ Data quality monitoring (alerts for quality issues)
  □ Self-service access process (data owner must approve access)

Phase 4: Ongoing
  □ Quarterly: Review classifications
  □ Annually: Update data governance policy
  □ Upon system changes: Update data catalog
```</vault-secret>

Questions about this topic?

Our experts advise you free of charge and without obligation.

Free Consultation

About the Author

Oskar Braun
Oskar Braun

Abteilungsleiter Information Security Consulting

E-Mail

Dipl.-Math. (WWU Münster) und Promovend am Promotionskolleg NRW (Hochschule Rhein-Waal) mit Forschungsschwerpunkt Phishing-Awareness, Behavioral Security und Nudging in der IT-Sicherheit. Verantwortet den Aufbau und die Pflege von ISMS, leitet interne Audits nach ISO/IEC 27001:2022 und berät als externer ISB in KRITIS-Branchen. Lehrbeauftragter für Communication Security an der Hochschule Rhein-Waal und NIS2-Schulungsleiter bei der isits AG.

ISO 27001 Lead Auditor (IRCA) ISB (TÜV)
This article was last edited on 04.03.2026. Responsible: Oskar Braun, Abteilungsleiter Information Security Consulting at AWARE7 GmbH. License: CC BY 4.0 - free use with attribution: "AWARE7 GmbH, https://a7.de"

Cookielose Analyse via Matomo (selbst gehostet, kein Tracking-Cookie). Datenschutzerklärung