Taxonomy Development Principles
Taxonomy Development Principles in AI Discoverability Architecture represent the systematic methodologies and foundational guidelines for creating structured classification systems that enable effective organization, retrieval, and navigation of AI models, datasets, and capabilities 12. These principles serve as the architectural foundation for making AI systems discoverable, interpretable, and accessible to both human users and automated systems. In the rapidly expanding landscape of machine learning models and AI applications, robust taxonomic structures are essential for managing complexity, facilitating knowledge transfer, and enabling efficient resource allocation 3. The importance of these principles has grown exponentially as organizations deploy increasingly diverse AI portfolios requiring coherent organizational frameworks that support search, recommendation, and governance functions.
Overview
The emergence of Taxonomy Development Principles in AI Discoverability Architecture stems from the exponential growth in machine learning models and AI applications over the past decade. As organizations transitioned from maintaining a handful of experimental models to managing hundreds or thousands of production AI systems, the need for systematic classification and organization became critical 12. The fundamental challenge these principles address is the inherent complexity and multidimensional nature of AI artifacts—models can be categorized by architecture type, task domain, data modality, performance characteristics, deployment context, and numerous other attributes simultaneously 3.
Historically, early AI development efforts relied on ad-hoc naming conventions and informal documentation practices that proved inadequate as portfolios scaled. The practice has evolved from simple hierarchical categorizations borrowed from traditional software engineering toward sophisticated multi-faceted classification systems that recognize the unique characteristics of AI systems 45. Modern taxonomy development incorporates insights from information science, knowledge organization theory, and ontology engineering, while addressing AI-specific considerations such as model lineage tracking, performance benchmarking, and ethical classification 6. This evolution reflects a maturation of the field, moving from purely technical classifications toward comprehensive frameworks that support governance, compliance, and strategic decision-making across the AI lifecycle.
Key Concepts
Faceted Classification
Faceted classification represents an approach that organizes AI systems along multiple independent dimensions or "facets," allowing items to be categorized simultaneously across different attributes 37. Unlike traditional hierarchical taxonomies that force items into single branches, faceted systems recognize that AI models possess multiple orthogonal characteristics that users may want to filter and search across independently.
Example: A computer vision model for medical imaging might be classified across multiple facets: Task Type (object detection), Data Modality (medical imaging), Architectural Family (convolutional neural network), Application Domain (healthcare/radiology), Deployment Context (edge device), and Regulatory Classification (medical device software). A radiologist searching for appropriate models could filter by "medical imaging" and "object detection" while a compliance officer could search across all models classified as "medical device software" regardless of their technical architecture.
Controlled Vocabularies
Controlled vocabularies establish standardized, authoritative terminology for describing AI concepts, ensuring consistency in how models, datasets, and capabilities are named and referenced across an organization 24. These vocabularies prevent the proliferation of synonyms, ambiguous terms, and inconsistent naming that impede discovery and create confusion.
Example: An enterprise AI platform might establish that "natural language processing," "NLP," "text analytics," and "language understanding" all map to the single controlled term "Natural Language Processing" in their taxonomy. When a data scientist uploads a sentiment analysis model and tags it with "text analytics," the system automatically normalizes this to the controlled term, ensuring that users searching for "NLP" will discover this model even though the creator used different terminology.
Polyhierarchy
Polyhierarchy allows taxonomic items to exist in multiple locations within a hierarchical structure, reflecting the reality that AI systems often legitimately belong to several categories simultaneously 56. This principle acknowledges that forcing items into single classification paths creates artificial constraints that reduce discoverability.
Example: A transformer-based model for code generation could legitimately appear in multiple hierarchical paths: under "Generative Models > Text Generation > Code Generation," under "Software Engineering Tools > Automated Programming," and under "Transformer Architectures > Encoder-Decoder Models." A software engineer browsing the "Software Engineering Tools" category and a machine learning researcher exploring "Transformer Architectures" would both discover this model through their respective navigation paths.
Metadata Schemas
Metadata schemas define the structured attributes captured for each classified AI artifact, specifying what information is recorded, in what format, and with what level of granularity 14. These schemas balance comprehensiveness with usability, capturing sufficient detail for meaningful differentiation without overwhelming users or creating excessive documentation burden.
Example: A model registry metadata schema might require: model name, version, architecture type (from controlled vocabulary), task category, training dataset identifier, parameter count, inference latency (measured on standard hardware), accuracy metrics (task-specific), framework and version, license type, creator/team, creation date, and deployment status. For a BERT-based question-answering model, this would capture: "BERT-QA-v2.1," "Transformer-Encoder," "Question Answering," "SQuAD 2.0," "110M parameters," "45ms," "F1: 88.5%," "PyTorch 1.12," "Apache 2.0," "NLP Research Team," "2024-03-15," "Production."
Semantic Relationships
Semantic relationships define meaningful connections between taxonomic nodes beyond simple parent-child hierarchies, including relationships like "related-to," "evolved-from," "requires," and "applicable-to" 35. These relationships enable graph-based navigation and support sophisticated discovery patterns and automated reasoning.
Example: In a model taxonomy, GPT-4 might have an "evolved-from" relationship to GPT-3, a "related-to" relationship to other large language models like PaLM and Claude, a "requires" relationship to specific tokenization datasets, and "applicable-to" relationships to use cases like content generation, code completion, and conversational AI. A user exploring GPT-3 could discover GPT-4 through the evolution relationship, while someone searching for content generation solutions would find GPT-4 through the applicability relationship.
Granularity Levels
Granularity levels determine the specificity of classification categories, balancing the need for precise differentiation against cognitive load and classification consistency 26. Effective taxonomies provide appropriate granularity for their intended use cases, avoiding both overly broad categories that fail to distinguish meaningfully and excessively fine-grained divisions that create confusion.
Example: A task-type taxonomy might use three granularity levels: Level 1 (broad): "Perception," "Generation," "Reasoning"; Level 2 (intermediate): under Perception: "Computer Vision," "Speech Recognition," "Natural Language Understanding"; Level 3 (specific): under Computer Vision: "Image Classification," "Object Detection," "Semantic Segmentation," "Instance Segmentation." A business executive browsing at Level 1 gets a high-level portfolio view, while a technical implementer can drill down to Level 3 to find precisely the right model type.
Extensibility Mechanisms
Extensibility mechanisms enable taxonomies to accommodate new AI technologies, paradigms, and use cases without requiring complete restructuring 47. These mechanisms include placeholder categories for emerging technologies, versioning systems that track taxonomic evolution, and processes for incorporating new classifications while maintaining backward compatibility.
Example: A taxonomy designed in 2020 might include an "Emerging Architectures" category alongside established categories like "Convolutional Networks" and "Recurrent Networks." When transformer architectures gained prominence, they could be promoted from "Emerging" to a first-class category without disrupting existing classifications. The taxonomy version would increment from 1.0 to 2.0, with migration documentation explaining how models previously classified under "Emerging Architectures > Attention-Based Models" should be reclassified under the new "Transformer Architectures" category.
Applications in AI Model Management and Discovery
Enterprise Model Registries
Organizations implement taxonomies within centralized model registries to manage their AI portfolios across development, staging, and production environments 12. The taxonomy enables data scientists to discover existing models before building new ones, reducing redundant development efforts and promoting reuse. For example, a financial services company might maintain a registry with 500+ models classified by business function (fraud detection, credit scoring, customer service), data type (transaction data, customer demographics, market data), and deployment status. When a new team needs a fraud detection model, they can filter the registry by "fraud detection" and "production-ready" to find validated models rather than starting from scratch.
Research Publication and Benchmark Tracking
Academic and industry research communities use taxonomies to organize publications, code implementations, and benchmark results, facilitating reproducibility and comparative analysis 35. Platforms like Papers with Code employ taxonomic structures that link research papers to their code repositories, datasets used, and performance benchmarks across standardized tasks. A researcher investigating state-of-the-art approaches for named entity recognition can navigate to the "Natural Language Processing > Information Extraction > Named Entity Recognition" category, view all papers classified there, compare their benchmark scores on standard datasets like CoNLL-2003, and access implementation code—all organized through the taxonomic structure.
AI Marketplace and Capability Discovery
Commercial AI marketplaces and cloud platforms leverage taxonomies to help customers discover pre-trained models and AI services matching their requirements 46. AWS SageMaker JumpStart, Azure AI Gallery, and Hugging Face Model Hub implement faceted taxonomies allowing users to filter by task type, industry vertical, framework, and deployment target. A healthcare startup seeking a medical image analysis model can filter by "Computer Vision," "Healthcare," "Medical Imaging," and "Edge Deployment," narrowing thousands of available models to a handful meeting their specific criteria. The taxonomy transforms an overwhelming catalog into a navigable, decision-supporting resource.
Governance and Compliance Management
Organizations use taxonomies to classify AI systems by risk level, regulatory requirements, and ethical considerations, enabling systematic governance and compliance monitoring 27. A taxonomy might include facets for "Data Sensitivity" (public, internal, confidential, regulated), "Decision Impact" (informational, low-stakes, high-stakes, critical), and "Regulatory Domain" (GDPR, HIPAA, financial services, none). Models classified as "regulated data" and "high-stakes decisions" automatically trigger enhanced review processes, documentation requirements, and ongoing monitoring. This taxonomic approach ensures that governance controls scale systematically across growing AI portfolios rather than relying on ad-hoc case-by-case assessments.
Best Practices
Employ Multi-Faceted Design for Complex Domains
AI systems possess multiple independent characteristics that cannot be adequately captured through single-path hierarchies, making multi-faceted taxonomies essential for effective discovery 37. The rationale is that users approach AI discovery from diverse perspectives—some search by technical architecture, others by business use case, and still others by deployment constraints. A single hierarchical path forces an arbitrary primary classification that disadvantages users approaching from other perspectives.
Implementation Example: Design a taxonomy with at least four independent facets: Technical Architecture (transformer, CNN, RNN, ensemble), Task Category (classification, generation, prediction, optimization), Data Modality (text, image, audio, video, multimodal), and Application Domain (healthcare, finance, manufacturing, customer service). Implement a faceted search interface where users can select values from any combination of facets. A user might select "Healthcare" + "Image" + "Classification" to find medical image classification models, while another selects "Transformer" + "Text" + "Generation" to find language generation models—both served effectively by the same underlying taxonomic structure.
Establish Clear Governance and Maintenance Processes
Taxonomies decay without ongoing maintenance, as new technologies emerge, terminology evolves, and classifications become outdated 24. The rationale is that AI is a rapidly evolving field where new architectures, paradigms, and applications appear continuously. Without systematic governance, taxonomies become increasingly misaligned with reality, reducing their utility and user trust.
Implementation Example: Establish a taxonomy governance board with representatives from data science, engineering, business units, and compliance. Schedule quarterly review meetings to assess taxonomy effectiveness, propose new categories, deprecate obsolete classifications, and resolve ambiguities. Implement a change request process where any user can propose taxonomic modifications, with requests reviewed by domain experts before implementation. Maintain version control with clear migration paths—when adding a new category like "Diffusion Models" to the architecture facet, provide guidance on reclassifying models previously categorized as "Generative Models > Other" and communicate changes through release notes.
Balance Automation with Human Curation
Purely automated classification scales efficiently but produces inconsistent results, while purely manual curation ensures quality but doesn't scale 16. The rationale is that some attributes (parameter count, framework, license) can be reliably extracted from model metadata, while others (appropriate use cases, ethical considerations, domain applicability) require human judgment and domain expertise.
Implementation Example: Implement a hybrid approach where automated systems extract and classify objective technical attributes from model cards, configuration files, and code repositories. For a new model submission, automatically populate: framework (parsed from requirements.txt), parameter count (extracted from model architecture), license (from LICENSE file), and training dataset (from documentation). Route the submission to domain expert curators for manual classification of: appropriate use cases, domain applicability, ethical considerations, and quality tier. Use active learning to improve automated classification over time—when curators correct automated classifications, feed these corrections back into the classification algorithms.
Design for Progressive Disclosure and Multiple Granularity Levels
Different users need different levels of detail, from executives seeking portfolio overviews to technical implementers requiring precise specifications 56. The rationale is that exposing all taxonomic detail to all users creates cognitive overload for non-technical users while hiding detail frustrates technical users seeking precise information.
Implementation Example: Implement a three-tier taxonomy with progressive disclosure: Tier 1 (Executive View) shows 5-7 broad categories like "Perception AI," "Generative AI," "Predictive AI," "Optimization AI." Tier 2 (Manager View) expands each into 3-5 subcategories—"Perception AI" expands to "Computer Vision," "Speech Recognition," "Natural Language Understanding." Tier 3 (Technical View) provides detailed classifications—"Computer Vision" expands to "Image Classification," "Object Detection," "Semantic Segmentation," "Instance Segmentation," "Image Generation." The user interface defaults to Tier 1 for business users and Tier 3 for technical users (based on role), but allows anyone to expand or collapse levels as needed.
Implementation Considerations
Tool and Format Selection
The choice of taxonomy management tools and representation formats significantly impacts maintainability, interoperability, and integration capabilities 24. Organizations must balance between lightweight formats like JSON or CSV that integrate easily with existing systems versus specialized taxonomy management platforms that provide sophisticated editing, validation, and governance features but require additional infrastructure.
Example: A startup with limited resources might implement their initial taxonomy as a structured JSON file stored in version control, with categories, facets, and controlled vocabularies defined in a human-readable format that developers can easily integrate into their model registry API. As the organization matures and the taxonomy grows more complex, they might migrate to a dedicated taxonomy management platform like PoolParty or a graph database like Neo4j that supports complex relationship modeling, provides visual editing interfaces, and enables sophisticated queries across semantic relationships. The migration path should preserve existing classifications while enabling enhanced capabilities.
Audience-Specific Customization
Different user communities require different taxonomic views and vocabularies, necessitating customization while maintaining underlying consistency 37. Data scientists think in terms of architectures and algorithms, business users focus on use cases and outcomes, compliance officers care about regulatory classifications, and executives need portfolio-level categorizations.
Example: Implement a single canonical taxonomy with multiple "lenses" or views tailored to different audiences. The data science lens exposes detailed technical facets (architecture family, training paradigm, optimization algorithm) with technical terminology. The business lens presents the same models organized by business function (customer acquisition, risk management, operational efficiency) and outcome metrics (revenue impact, cost reduction, customer satisfaction). The compliance lens highlights regulatory classifications, data sensitivity levels, and decision impact categories. All lenses reference the same underlying models and metadata, but present different facets and use different terminology appropriate to each audience.
Organizational Maturity and Phased Rollout
Taxonomy sophistication should match organizational AI maturity—overly complex taxonomies overwhelm organizations early in their AI journey, while simplistic structures constrain mature AI-native organizations 15. Implementation should follow a phased approach that grows with organizational capabilities.
Example: Phase 1 (AI Experimentation): Implement a simple two-level hierarchy with 5-7 top-level categories (Computer Vision, Natural Language Processing, Predictive Analytics, Recommendation Systems, Optimization) and basic metadata (model name, owner, status). Phase 2 (AI Scaling): Add faceted classification with 3-4 facets, controlled vocabularies, and richer metadata including performance metrics and deployment information. Phase 3 (AI Industrialization): Implement full multi-faceted taxonomy with semantic relationships, multiple granularity levels, audience-specific views, and integration with governance workflows. Each phase builds on the previous, with migration paths that preserve existing classifications while enabling enhanced capabilities.
Integration with Existing Systems
Taxonomies deliver value only when integrated with the systems where users actually discover and interact with AI assets 46. Standalone taxonomies that exist separately from model registries, documentation systems, and development workflows see limited adoption regardless of their quality.
Example: Integrate the taxonomy with the organization's model registry API so that model search and filtering operations use taxonomic categories. Connect to the CI/CD pipeline so that model deployment workflows automatically capture taxonomic metadata. Link to the documentation system so that model cards display taxonomic classifications and users can navigate between related models through taxonomic relationships. Integrate with the data catalog so that datasets are classified using compatible taxonomic structures, enabling discovery of models trained on specific data types. Embed taxonomic navigation in the internal developer portal where data scientists begin their work, rather than requiring them to visit a separate taxonomy management system.
Common Challenges and Solutions
Challenge: Rapid Technological Evolution
The AI field evolves at extraordinary pace, with new architectures, paradigms, and applications emerging continuously 35. Taxonomies designed around current technologies quickly become outdated as novel approaches like diffusion models, retrieval-augmented generation, or multimodal foundation models gain prominence. Static taxonomic structures struggle to accommodate innovations that don't fit existing categories, leading to awkward classifications, proliferation of "other" categories, and user frustration.
Solution:
Design taxonomies with explicit extensibility mechanisms from the outset. Create "Emerging Technologies" or "Novel Approaches" categories at each level of the hierarchy where new innovations can be temporarily classified while their long-term taxonomic placement is determined 47. Establish quarterly or biannual review cycles where the taxonomy governance board assesses new technologies, determines whether they warrant new first-class categories, and plans migration paths for items currently in "emerging" classifications. Implement versioning with clear semantic version numbers (major.minor.patch) where major versions indicate structural changes, minor versions add new categories, and patches correct errors. When promoting "Diffusion Models" from emerging to first-class status, increment the version, provide migration documentation, and communicate changes through release notes and user training.
Challenge: Multi-Stakeholder Alignment
Different organizational units and user communities have competing classification priorities and incompatible mental models 26. Data scientists want detailed technical classifications, business units need use-case-oriented categories, compliance teams require regulatory classifications, and executives seek portfolio-level groupings. Attempting to satisfy all perspectives through a single hierarchical structure creates either excessive complexity or fails to serve any group well.
Solution:
Implement a multi-faceted taxonomy architecture that accommodates diverse perspectives simultaneously rather than forcing consensus on a single classification scheme 37. Conduct stakeholder workshops with representatives from each major user community to identify their specific discovery needs and mental models. Design independent facets that address each perspective: technical architecture facet for data scientists, business function facet for business users, regulatory classification facet for compliance, and strategic category facet for executives. Ensure the underlying metadata schema captures attributes relevant to all facets. Provide audience-specific interfaces that emphasize relevant facets while de-emphasizing others—the data science portal highlights technical facets, the business dashboard emphasizes use cases and outcomes, and the compliance system foregrounds regulatory classifications. This approach transforms competing requirements from a zero-sum conflict into complementary views of the same underlying reality.
Challenge: Granularity Calibration
Determining appropriate classification specificity presents a persistent challenge—overly coarse taxonomies fail to differentiate meaningfully between AI systems, while excessive granularity creates cognitive overload and classification inconsistency 15. A taxonomy with only "Computer Vision" and "Natural Language Processing" provides insufficient differentiation, but one with dozens of narrowly-defined subcategories like "Multi-Object Tracking in Crowded Scenes" becomes unwieldy and difficult to apply consistently.
Solution:
Implement a three-tier granularity structure with progressive disclosure, allowing users to navigate at their appropriate level of detail 6. Conduct user testing with representative discovery scenarios to identify optimal specificity at each tier. Tier 1 should contain 5-7 broad categories that provide meaningful high-level differentiation. Tier 2 expands each to 3-5 subcategories that address common discovery needs. Tier 3 provides detailed classifications for technical specialists. For example: Tier 1: "Computer Vision"; Tier 2: "Object Detection," "Image Segmentation," "Image Generation," "Video Analysis"; Tier 3: under Object Detection: "Single-Stage Detectors," "Two-Stage Detectors," "Transformer-Based Detection." Provide interface controls that allow users to expand or collapse levels based on their needs, with defaults appropriate to user roles. Monitor usage analytics to identify categories that are too broad (users consistently drill down) or too granular (rarely used), and adjust in subsequent taxonomy versions.
Challenge: Automation Accuracy vs. Curation Scale
Automated classification from model metadata and documentation enables scalability but produces inconsistent or inaccurate results, particularly for nuanced attributes like appropriate use cases or ethical considerations 24. Manual curation by domain experts ensures quality but doesn't scale to portfolios of hundreds or thousands of models, creating bottlenecks that delay model deployment and frustrate users.
Solution:
Implement a tiered hybrid approach that applies automation where reliable and human curation where necessary, with active learning to continuously improve automated classification 16. Classify attributes into three tiers: Tier 1 (Fully Automated): objective technical attributes reliably extractable from code and configuration—framework, parameter count, license, training dataset identifier. Implement automated extraction pipelines that parse model repositories, configuration files, and model cards to populate these attributes. Tier 2 (Automated with Human Validation): attributes that can be inferred with reasonable accuracy but benefit from expert review—task category, architecture family, deployment context. Use machine learning classifiers trained on previously curated examples to suggest classifications, but route to human reviewers for validation before finalizing. Tier 3 (Human Curated): nuanced attributes requiring domain expertise—appropriate use cases, ethical considerations, domain-specific applicability, quality tier. Route these directly to domain expert curators with relevant expertise. Implement active learning where curator corrections to automated classifications feed back into the classification models, continuously improving accuracy and gradually shifting attributes from Tier 2 to Tier 1 as automation reliability increases.
Challenge: Taxonomy Decay and Maintenance Burden
Without ongoing maintenance, taxonomies become increasingly misaligned with organizational reality as new models are added with inconsistent classifications, terminology evolves, categories become obsolete, and structural issues accumulate 47. However, taxonomy maintenance competes with other priorities for limited resources, and organizations often lack clear ownership and processes for ongoing curation, leading to gradual degradation that undermines user trust and adoption.
Solution:
Establish explicit governance structures with defined roles, responsibilities, and regular maintenance cycles integrated into organizational workflows 25. Designate a taxonomy owner (typically from data science leadership or enterprise architecture) accountable for overall taxonomy health. Form a taxonomy governance board with representatives from major stakeholder groups (data science, engineering, business units, compliance) that meets quarterly to review taxonomy effectiveness, approve structural changes, and prioritize improvements. Implement a lightweight change request process where any user can propose taxonomic modifications through a standard form, with requests triaged by the taxonomy owner and reviewed by the governance board. Establish automated quality monitoring that flags potential issues: models classified as "other" or "miscellaneous" (indicating missing categories), categories with zero or very few items (indicating over-granularity), categories with excessive items (indicating under-granularity), and inconsistent terminology. Schedule annual comprehensive reviews that assess alignment with organizational strategy, user satisfaction through surveys, and usage analytics. Allocate dedicated capacity (e.g., 20% of one information architect's time) for ongoing taxonomy maintenance rather than treating it as discretionary work that gets perpetually deferred.
References
- Gebru, T., et al. (2018). Datasheets for Datasets. https://arxiv.org/abs/1803.09010
- Mitchell, M., et al. (2019). Model Cards for Model Reporting. https://arxiv.org/abs/1810.03993
- Bender, E. M., & Friedman, B. (2018). Data Statements for Natural Language Processing: Toward Mitigating System Bias and Enabling Better Science. https://aclanthology.org/2020.acl-main.442/
- Google Research. (2019). Model Card Toolkit. https://research.google/pubs/pub48120/
- Pushkarna, M., Zaldivar, A., & Kjartansson, O. (2022). Data Cards: Purposeful and Transparent Dataset Documentation for Responsible AI. https://arxiv.org/abs/2108.07258
- Elsevier. (2021). Knowledge Organization and Information Retrieval in the Age of AI. https://www.sciencedirect.com/science/article/pii/S0306457321001035
- IEEE. (2021). Taxonomy Development for AI Systems Classification. https://ieeexplore.ieee.org/document/9338283
- Rostamzadeh, N., et al. (2020). Healthsheet: Development of a Transparency Artifact for Health Datasets. https://arxiv.org/abs/2011.03395
