Building A Canadian Cultural Data Catalogue - Building Canadian Cultural Data Catalogue(CCDC)

The Canadian Cultural Data Catalogue (CCDC) is a national discovery platform designed to standardize and surface cultural datasets across Canada. Developed from the ground up, the project involved designing a scalable metadata schema, building a flexible cultural taxonomy, and translating both into a searchable, production-ready digital platform.

Cultural data in Canada is distributed across government agencies, research institutions, arts councils, and independent organizations. However, the ecosystem lacked a centralized mechanism to:

Discover datasets across disciplines
Compare methodologies and structures
Assess accessibility and governance
Navigate fragmented classification systems

For researchers, policymakers, and arts organizations, locating relevant datasets required extensive manual searching across disconnected sources.

This presented a structural problem: not an absence of data, but an absence of organized, searchable infrastructure. Thus, the objective of this project was to design and launch a national catalogue that would make cultural datasets discoverable, structured, and scalable.

Defining the Product

The objectives of Building a Data Fluent Canadian Cultural Sector are to discover and strengthen Canada’s current cultural analytics capacity by setting the groundwork for unified initiatives in data collection and analysis by the Canadian cultural sector; identifying academic, public sector and industry researchers undertaking cultural analytics; coordinating significant and ongoing engagement to lead to future research and establish a shared database. The project aims to:

Identify existing data collection efforts in Canada
Describe data types, collection methods and standards
Set the ground for a common data language and governance framework
Establish who is currently undertaking cultural analytics research and what data they are collecting and producing
Connect researchers, and their data

Project 1: Develop Database of cultural databases

The first part of this project consists of the following:

Identify existing efforts in Canada to collect and organize cultural data, describe data types, collection methods and standards.
Identify and describe existing cultural databases both contemporary and historical
Design an aggregated database of databases that indicates current cultural databases and prototype a database that would aggregate these databases.

‍

Environmental scan of industry best practices & available cultural data in Canada

‍

Empathy Mapping and Early User Needs Definition

After the initial landscape scan and the prototype “database of databases” concept, the next step was to clarify who the platform was for and what a “successful search experience” would look like in practice. We moved from inventory-building to experience definition by consolidating signals from three sources:

User personas derived from our research and early discovery work
Stakeholder and partner conversations across government, academic, and nonprofit data holders
Early platform testing signals from prospective users exploring how they currently discover and assess datasets

‍

The empathy mapping directly informed platform requirements, including:

Metadata clarity: surface provenance, ownership, dates, and access steps early
Comparability: standardize dataset “cards” so users can scan and compare quickly
Decision support: make licensing, usage notes, and limitations easier to interpret
Reduced friction: shorten the path from “found it” to “I understand how to use it”

This became the bridge between the research inventory work and experience design decisions, shaping how datasets are represented, filtered, and understood in the catalogue.

Designing the Metadata Foundation

The first iteration of the product was not a digital interface but a structured metadata framework. Using the School of Cities Metadata Maturity paper by our colleagues in University of Toronto as the foundation, we developed a standardized schema defining:

Dataset title and description
Source organization
Geographic and temporal coverage
Accessibility status (open, restricted, physical)
Cultural domains
Cross-domain classifications
Governance and ownership
Update frequency

‍

‍

To validate the model, we initially manually catalogued real datasets in a structured Excel environment. This approach functioned as:

A validation tool for schema clarity
A testing ground for classification edge cases
A controlled environment for iterative refinement

Developing a Scalable Taxonomy

As the number of catalogued datasets grew, classification complexity increased significantly. Cultural practices frequently overlap across domains, and disciplinary boundaries are rarely fixed. We referenced established Canadian cultural taxonomies as a foundation, given that many data-producing organizations and funding bodies align with these standards. This ensured interoperability across institutions.

‍

Established Canadian cultural taxonomies as foundation

‍

However, we recognized that existing taxonomies are not exhaustive and may not fully reflect the diversity and evolution of cultural practices in Canada. A rigid, single-hierarchy model would have constrained discoverability and reduced accuracy. To address this, we designed a flexible, polyhierarchical structure complemented by a thematic layer that functions similarly to a controlled folksonomy. This allows for more granular tagging and the inclusion of emerging or cross-disciplinary practices, while maintaining governance oversight to ensure consistency and scalability.

Primary Cultural Categories/Domains
Transversal (cross-cutting) Categories/Domains
Related Categories/Domains
A granular Theme layer for flexible tagging

This structure enabled:

Multi-domain categorization without duplication
Cross-disciplinary discoverability
Future extensibility

The taxonomy directly informed:

Filter architecture
Search indexing logic
Database relationships
UI navigation patterns

Information Architecture, and Service Design

Before interface design, we conducted qualitative research across comparable data discovery platforms to evaluate information hierarchy, search behavior, and metadata prioritization patterns. These findings were combined with insights from interest holders interviews, including researchers, arts administrators, and policymakers, and sector expertise from our Principal Investigator, Sara Diamond.

‍

Information Hierarchy Research by silvana novia

‍

This synthesis directly informed the platform’s information architecture and service structure.

Information Hierarchy

We defined clear prioritization rules for:

Critical content on the homepage
High-signal metadata fields in search results
Structured hierarchy within dataset detail pages
Feature scope for initial launch versus future iterations

‍

Information Hierarchy: Detail Page & Landing Page

‍

‍

Search results were designed to surface essential metadata such as provider, geographic scope, thematic classification, temporal coverage, and access status, allowing rapid assessment. Detail pages were organized into structured categories (content, provenance, governance, temporal/geospatial) to support deeper evaluation.

‍

‍

Service Blueprint and Governance

Personas were developed from stakeholder research to reflect varying levels of data literacy and discovery intent. The service blueprint mapped:

Entry points and user pathways
Frontstage interactions (search, filter, contribute)
Backstage processes (review, taxonomy governance, validation)

‍

‍

‍

This clarified operational dependencies early, reducing implementation risk and preventing governance bottlenecks.

Interface Principles

The interface was intentionally search-forward and structurally disciplined, featuring:

Persistent filters
Multi-select categorization
Hierarchical metadata chunking
Clear access and governance indicators

Design emphasis was placed on clarity, comparability, and trust, ensuring the data remained central to the experience.

Engineering & Delivery

Once the information hierarchy and high-fidelity prototype were finalized, we transitioned from conceptual design to technical execution.

Translating Architecture into Database Logic

‍

‍

The first critical step was converting the metadata schema into a relational SQL database model. SQL was selected to ensure:

Structured relationships between datasets and classification layers
Scalable indexing for search queries
Data integrity across multi-domain categorization

Because our engineer joined mid-project, we worked closely to align taxonomy concepts with database constraints. This collaboration surfaced early misconceptions in our conceptual model and allowed us to refine field relationships, reduce redundancy, and optimize for search performance before full implementation.

Search Logic and Indexing

The taxonomy, transversal domains, and thematic layers were translated into queryable logic. This required iterative cycles of:

Uploading structured test datasets
Refining theme standardization
Adjusting filter logic and indexing behavior
Improving result accuracy and consistency

‍

Rounds of Regression Test for Search Logic Functionality

‍

Scope Definition and Milestone Prioritization

‍

A phased roadmap defined feature sequencing and launch readiness criteria. For the initial release, we prioritized:

Stable and reliable search functionality
Clean, validated dataset entries
Fully operational filter logic
Contact and submission forms for stakeholder engagement

Advanced features were intentionally deferred to maintain delivery discipline.

Quality Assurance and Live Readiness

Before launch, we conducted structured testing cycles to validate:

Indexing consistency and ranking behaviour
Multi-select filter interactions
Metadata rendering across categories
Broken links and accessibility compliance

The high-fidelity prototype was translated into production through close collaboration between design and engineering, with multiple pre-launch test runs conducted in preparation for live demonstration.

Launch and Validation

The platform was publicly launched at the Mass Culture Digital National Assembly (DNA) Expo.

‍

‍

‍

Attendance exceeded expectations, with close to 100 participants joining the session. Following a brief presentation outlining the project’s objectives, methodology, and roadmap, participants were invited to actively explore the platform in real time. We were transparent that this was the first large-scale public use of the online product and positioned the session as a live testing environment.

While the search logic did not perform flawlessly in every instance, participants were informed in advance that they were contributing to an early-stage validation process. This transparency fostered constructive engagement rather than hesitation.

The session yielded several tangible outcomes:

Direct usability feedback on search behavior and filter logic
Identification of edge cases in indexing and thematic categorization
Incoming submissions and inquiries through the platform’s contact forms
Multiple collaboration opportunities with sector partners

The launch served not only as a release milestone but as a live stress test under real user conditions. It validated the platform’s core architecture while providing actionable insights for refinement and iteration.

Key Product Decisions & Trade-offs

Throughout development, several strategic trade-offs shaped the platform’s direction.

1. Standardization vs. Flexibility
We aligned with established Canadian cultural taxonomies to ensure interoperability with funders and data-producing institutions. However, we introduced a thematic layer to accommodate emerging or cross-disciplinary practices. This balanced structural consistency with adaptability.

2. Feature Depth vs. Launch Readiness
Advanced capabilities, such as user accounts and expanded contribution workflows, were intentionally deferred. For launch, we prioritized a stable, high-performing search system and validated dataset entries over feature breadth.

3. Ideal Classification vs. Technical Feasibility
Early taxonomy models required refinement once translated into relational SQL logic. Rather than forcing conceptual purity, we adjusted structures to ensure database efficiency, scalable indexing, and predictable filter behavior.

4. Visual Complexity vs. Cognitive Clarity
We deliberately avoided dense dashboards or decorative elements. The interface prioritized metadata hierarchy, comparability, and trust signals to support analytical users.

‍