Content Classification Methods

Content classification methods in AI discoverability architecture represent systematic approaches for automatically assigning digital content to predefined categories or generating taxonomic labels based on semantic, structural, or contextual features 12. These methods employ machine learning algorithms and natural language processing techniques to transform unstructured information—ranging from text documents and images to multimedia assets and structured data—into organized, searchable, and semantically meaningful representations that facilitate efficient information retrieval and knowledge discovery. In an era where organizations generate petabytes of data daily, robust content classification methods are essential for building intelligent search systems, recommendation engines, and knowledge management platforms that enable users and AI agents to discover relevant information efficiently. The primary purpose of content classification in AI discoverability is to create semantic metadata that enhances content findability through search, filtering, and recommendation systems, thereby unlocking the full value of organizational knowledge assets.

Overview

The emergence of content classification methods in AI discoverability architecture stems from the exponential growth of digital information and the corresponding challenge of organizing and retrieving relevant content at scale. Historically, content organization relied on manual cataloging and rule-based systems, which proved inadequate as data volumes exceeded human processing capacity. The fundamental challenge these methods address is transforming unstructured information into structured, categorized representations that enable efficient discovery without requiring manual intervention for every content item.

The practice has evolved significantly over time, progressing from simple keyword-based categorization to sophisticated machine learning approaches. Early systems employed traditional classifiers like Naive Bayes and Support Vector Machines, which required extensive feature engineering to convert text into numerical representations 1. The introduction of deep learning revolutionized the field, with neural network architectures capable of learning hierarchical feature representations directly from raw content. Most recently, transformer-based models like BERT and RoBERTa have achieved state-of-the-art performance by capturing contextual semantics through self-attention mechanisms 12. These models can be fine-tuned on domain-specific data, enabling nuanced classification that understands semantic meaning rather than merely matching keywords. The evolution continues with zero-shot and few-shot learning frameworks that can classify content based on natural language descriptions of categories, dramatically reducing the labeled training data required 4.

Key Concepts

Supervised Learning Classification

Supervised learning classification refers to training algorithms on labeled datasets where each content item is associated with predefined category labels, enabling the model to learn patterns that map content features to categories 12. This paradigm forms the foundation of most production classification systems, as it provides predictable performance when sufficient representative training data exists.

Example: A legal technology company implementing e-discovery software trains a supervised classification model on 50,000 manually labeled legal documents spanning categories like "contracts," "correspondence," "pleadings," and "discovery materials." Legal experts annotate each document, and the system learns to identify distinguishing features—such as specific legal terminology, document structure patterns, and citation formats—that characterize each category. When deployed, the model automatically classifies incoming documents during litigation, reducing manual review time from weeks to hours and enabling attorneys to quickly locate relevant materials through category-based filtering.

Transfer Learning and Fine-Tuning

Transfer learning involves leveraging pre-trained language models that have learned general linguistic patterns from massive text corpora, then fine-tuning these models on domain-specific classification tasks with relatively small labeled datasets 123. This approach dramatically reduces the data and computational resources required compared to training models from scratch.

Example: A biomedical research institution needs to classify scientific abstracts into research methodology categories (clinical trials, systematic reviews, case studies, laboratory experiments). Rather than training a model from scratch requiring hundreds of thousands of labeled examples, they fine-tune BioBERT—a variant of BERT pre-trained on biomedical literature—using only 5,000 labeled abstracts 3. The pre-trained model already understands biomedical terminology and linguistic structures, so fine-tuning adapts this knowledge to the specific classification task. The resulting system achieves 92% accuracy, enabling researchers to quickly filter literature searches by methodology type and accelerating systematic review processes.

Multi-Label Classification

Multi-label classification enables content items to be assigned to multiple categories simultaneously, reflecting the reality that content often legitimately belongs to several taxonomic categories rather than a single exclusive class 6. This approach uses techniques like binary relevance (training independent classifiers for each label) or more sophisticated methods that model label dependencies.

Example: An enterprise content management system for a multinational corporation classifies internal documents across multiple dimensions: department (Marketing, Engineering, Legal, Finance), document type (Report, Presentation, Memo, Policy), and topic (Product Development, Customer Relations, Compliance, Strategy). A quarterly product roadmap presentation might be classified as [Marketing, Engineering], [Presentation], [Product Development, Strategy]. This multi-label approach enables employees to discover the document through multiple search facets—finding it when filtering for "Marketing presentations," "Engineering strategy documents," or "Product development materials"—significantly improving discoverability compared to single-category classification.

Active Learning

Active learning is a methodology that iteratively selects the most informative unlabeled samples for human annotation, optimizing the training process by focusing labeling effort on examples that most improve model performance 7. This approach significantly reduces annotation costs while maintaining classification quality.

Example: A media company needs to classify millions of news articles into topical categories but has limited budget for manual annotation. They implement an active learning pipeline that begins with 1,000 randomly labeled articles to train an initial model. The system then analyzes 100,000 unlabeled articles, identifying 500 examples where the model has the highest uncertainty (predictions near 50% confidence for multiple categories). Human annotators label only these uncertain cases, and the model retrains. After five iterations totaling 3,500 labeled articles, the system achieves performance equivalent to training on 15,000 randomly selected examples—reducing annotation costs by 77% while enabling accurate classification of the entire article archive.

Hierarchical Classification

Hierarchical classification leverages taxonomic structures organized in parent-child relationships, employing strategies that classify content at multiple levels of granularity—from broad categories to specific subcategories 6. This approach can use top-down methods (classifying broad categories first, then refining) or flat methods (predicting all levels simultaneously).

Example: An e-commerce platform organizes products in a three-level taxonomy: Level 1 (Electronics, Clothing, Home & Garden), Level 2 (for Electronics: Computers, Audio, Cameras), Level 3 (for Computers: Laptops, Desktops, Tablets, Accessories). The hierarchical classifier first determines that a new product listing is "Electronics" (Level 1), then classifies it as "Computers" (Level 2), and finally as "Laptops" (Level 3). This enables customers to browse from general to specific categories, apply filters at any taxonomy level, and discover products through multiple navigation paths—significantly improving product discoverability compared to flat classification schemes.

Concept Drift Detection

Concept drift refers to changes in content characteristics or category definitions over time, requiring continuous monitoring and model adaptation to maintain classification accuracy as the underlying data distribution evolves 7. Detection mechanisms identify performance degradation and trigger retraining workflows.

Example: A social media platform's content moderation system classifies posts for policy violations. Initially trained in 2023, the model performs well for six months but gradually degrades as users adopt new slang terms, memes, and communication patterns to evade detection. The platform implements drift detection by continuously evaluating the model against a held-out test set and monitoring user appeal rates for moderation decisions. When accuracy drops from 94% to 87% over three months, the system automatically triggers a retraining workflow incorporating recent labeled examples, restoring performance to 93% and adapting to evolving content patterns.

Model Explainability

Model explainability encompasses techniques that make classification decisions interpretable to human stakeholders, revealing which content features influenced category assignments 8. Methods like LIME (Local Interpretable Model-agnostic Explanations) and SHAP (SHapley Additive exPlanations) provide insights into model reasoning.

Example: A financial services company uses classification to categorize customer support tickets for routing to specialized teams. Regulatory compliance requires explaining automated decisions, so they implement SHAP analysis for their transformer-based classifier. When a ticket is classified as "fraud investigation" rather than "account inquiry," the system highlights the specific phrases that influenced the decision—such as "unauthorized transaction," "didn't recognize charge," and "card was stolen." This transparency enables quality assurance teams to validate classification logic, identify potential biases, and provide customers with explanations for how their inquiries are handled, building trust in the automated system.

Applications in Enterprise Information Management

Content classification methods find extensive application across enterprise information management scenarios, enabling organizations to harness their knowledge assets effectively. Digital asset management systems employ classification to organize multimedia content libraries, automatically tagging images, videos, and documents with descriptive categories that enable creative teams to quickly locate assets for campaigns 12. A global advertising agency might classify 500,000 creative assets across dimensions like brand, campaign, media type, and visual style, reducing asset search time from hours to seconds and preventing duplicate asset creation.

Knowledge management platforms leverage classification to organize internal documentation, best practices, and institutional knowledge, making expertise discoverable across organizational silos 6. A consulting firm with 10,000 employees might classify project deliverables, research reports, and methodology documents into practice areas, industries, and service types, enabling consultants to discover relevant prior work when starting new engagements—reducing redundant research and accelerating project delivery.

Regulatory compliance and governance systems use classification to identify and protect sensitive information, automatically categorizing content containing personally identifiable information, financial data, or intellectual property 8. A healthcare organization might classify clinical documents to ensure proper access controls, automatically identifying records containing protected health information and applying appropriate security policies—ensuring HIPAA compliance while enabling authorized personnel to discover necessary patient information.

Enterprise search and discovery platforms employ classification metadata as ranking signals and faceted navigation dimensions, dramatically improving search relevance and user experience 12. An employee searching a corporate intranet for "quarterly results" might filter results by department (Finance), document type (Report), and time period (Q4 2024), with classification metadata enabling precise retrieval of the specific document needed from among thousands of search results—transforming search from a frustrating keyword hunt into an efficient discovery experience.

Best Practices

Leverage Transfer Learning with Domain-Specific Fine-Tuning

Rather than training classification models from scratch, practitioners should leverage pre-trained transformer models and fine-tune them on domain-specific labeled data, significantly reducing data requirements and training time while achieving superior performance 123.

Rationale: Pre-trained models like BERT, RoBERTa, and domain-specific variants have already learned general linguistic patterns and semantic representations from massive text corpora. Fine-tuning adapts this foundational knowledge to specific classification tasks with relatively small labeled datasets—often achieving strong performance with hundreds or thousands of examples rather than the hundreds of thousands required for training from scratch.

Implementation Example: A pharmaceutical company needs to classify adverse event reports into severity categories. Instead of training a model from scratch requiring 100,000+ labeled reports, they fine-tune BioBERT using 3,000 annotated examples 3. The implementation uses the Hugging Face Transformers library, loading the pre-trained BioBERT model, adding a classification head for severity categories, and training for 5 epochs with a learning rate of 2e-5. The fine-tuned model achieves 89% accuracy—comparable to models trained on 10x more data—and deploys within two weeks rather than the months required for traditional approaches.

Implement Continuous Monitoring and Retraining Workflows

Classification systems should include automated monitoring to detect performance degradation and trigger retraining workflows, ensuring accuracy remains aligned with evolving content distributions and category definitions 78.

Rationale: Content characteristics change over time due to evolving language, emerging topics, and shifting organizational priorities. Models trained on historical data gradually degrade as concept drift occurs. Continuous monitoring detects this degradation early, while automated retraining workflows maintain performance without manual intervention.

Implementation Example: An e-commerce platform implements MLOps infrastructure that continuously evaluates their product classification model against a held-out test set refreshed monthly with recent labeled data. When F1-score drops below 0.92 (from the baseline 0.95), the system automatically triggers a retraining pipeline that incorporates the past three months of production data with human-verified labels. The pipeline retrains the model, evaluates performance on validation data, and—if improvements are confirmed—deploys the updated model through a blue-green deployment strategy that enables instant rollback if issues arise. This automation maintains classification quality while reducing manual oversight from weekly reviews to monthly audits.

Design Taxonomies Collaboratively with Domain Experts and End Users

Effective classification requires taxonomies that reflect both domain expertise and actual user information-seeking behaviors, necessitating collaborative design processes that involve subject matter experts and end users 68.

Rationale: Taxonomies designed solely by data scientists may not align with domain-specific conceptual frameworks or user mental models, resulting in technically accurate classifications that fail to support effective discovery. Collaborative design ensures categories are meaningful, mutually exclusive where appropriate, and aligned with how users actually search for and think about content.

Implementation Example: A legal research platform redesigning their case law classification taxonomy convenes a working group including law librarians, practicing attorneys, legal scholars, and UX researchers. Through card sorting exercises, they discover that attorneys search by legal issue (contract disputes, tort claims) rather than traditional legal classifications (civil procedure, substantive law). The team designs a hybrid taxonomy with primary categories aligned to legal issues and secondary tags for traditional classifications. User testing with 50 attorneys shows 40% faster task completion for legal research scenarios compared to the previous taxonomy, validating the collaborative design approach before implementing the classification system.

Address Class Imbalance Through Targeted Techniques

Classification datasets frequently exhibit class imbalance where some categories contain far more examples than others, requiring specific techniques to prevent models from biasing toward majority classes 7.

Rationale: Standard training procedures optimize overall accuracy, which can be achieved by simply predicting majority classes while ignoring minority classes. This produces poor performance for underrepresented categories that may be critically important for discoverability. Targeted techniques ensure balanced performance across all categories.

Implementation Example: A content moderation system classifies user posts into "safe," "spam," "harassment," and "hate speech," but the training data contains 90% "safe" posts, 7% "spam," 2% "harassment," and 1% "hate speech." Without intervention, the model achieves 90% accuracy by classifying everything as "safe"—completely failing to detect harmful content. The team implements class-weighted loss functions that penalize misclassification of minority classes more heavily, combined with oversampling of minority classes using SMOTE (Synthetic Minority Over-sampling Technique). The balanced approach achieves 88% overall accuracy with 85% recall for "hate speech"—successfully detecting harmful content while maintaining acceptable false positive rates.

Implementation Considerations

Tool and Framework Selection

Implementing content classification requires selecting appropriate tools and frameworks based on scale requirements, team expertise, and integration needs. For prototyping and small-scale deployments, scikit-learn provides accessible implementations of traditional classifiers with minimal infrastructure requirements 1. Production systems handling large-scale classification typically leverage deep learning frameworks like PyTorch or TensorFlow, with the Hugging Face Transformers library providing pre-trained models and fine-tuning utilities that accelerate development 123. Organizations with limited machine learning expertise might consider cloud-based services like Amazon Comprehend, Google Cloud Natural Language, or Azure Cognitive Services that provide pre-built classification capabilities through API interfaces, trading customization for reduced implementation complexity.

Example: A mid-sized publishing company with a small data science team needs to classify 100,000 articles into topical categories. They evaluate building a custom solution using PyTorch versus using AWS Comprehend Custom Classification. Analysis reveals that custom development would require 3 months of data scientist time plus ongoing infrastructure management, while AWS Comprehend requires 2 weeks for data preparation and model training with minimal ongoing maintenance. They select AWS Comprehend, achieving 87% accuracy—sufficient for their use case—while freeing data science resources for higher-value projects.

Annotation Strategy and Quality Control

The quality of training data fundamentally determines classification performance, requiring thoughtful annotation strategies and quality control processes. Organizations should develop detailed annotation guidelines that define category boundaries, provide examples of edge cases, and establish decision rules for ambiguous content 78. Measuring inter-annotator agreement through metrics like Cohen's Kappa identifies inconsistencies requiring guideline refinement. For large-scale annotation projects, combining expert annotators for difficult cases with crowdsourced annotation for straightforward examples balances quality and cost.

Example: A healthcare organization annotating clinical notes for classification into diagnosis categories establishes a three-tier annotation process. Tier 1: Medical students annotate straightforward cases (clearly documented single diagnoses) at $15/hour. Tier 2: Registered nurses review ambiguous cases and all Tier 1 annotations with confidence scores below 0.8, at $45/hour. Tier 3: Physicians resolve disagreements and annotate complex cases with multiple comorbidities, at $120/hour. This stratified approach achieves 95% annotation accuracy while reducing costs by 60% compared to physician-only annotation, enabling the creation of a 50,000-example training dataset within budget constraints.

Evaluation Metrics Aligned with Business Objectives

Classification evaluation should employ metrics aligned with specific business objectives rather than defaulting to overall accuracy, which can be misleading for imbalanced datasets or when different error types have different costs 78. Precision measures the proportion of predicted categories that are correct (critical when false positives are costly), while recall measures the proportion of actual category members that are identified (critical when false negatives are costly). F1-score balances precision and recall, while per-category metrics reveal performance variations across the taxonomy.

Example: A financial services company classifying customer support tickets prioritizes different metrics for different categories. For "fraud alert" classification, they optimize for recall (minimizing false negatives) even at the cost of precision, since missing a fraud case has severe consequences while false positives only result in unnecessary security reviews. For "general inquiry" classification, they optimize for precision to ensure customers aren't routed to incorrect departments. The evaluation framework tracks per-category precision and recall, with alerting thresholds customized to business impact—triggering immediate review if fraud recall drops below 95%, but tolerating general inquiry precision as low as 80%.

Integration with Discoverability Architecture

Classification systems must integrate seamlessly with broader discoverability infrastructure including search indices, recommendation engines, and content management systems 12. This requires establishing data pipelines that enrich content with classification metadata, API interfaces that expose classification capabilities to downstream systems, and monitoring that tracks end-to-end discoverability metrics rather than just classification accuracy.

Example: A media company integrates their article classification system with their content delivery platform through a real-time enrichment pipeline. When journalists publish articles, the content management system sends article text to the classification API, which returns category predictions and confidence scores within 200ms. The CMS stores this metadata in Elasticsearch indices that power the website's faceted search interface, and sends category information to the recommendation engine that suggests related articles. The integration includes fallback logic that applies rule-based classification if the ML service is unavailable, ensuring content remains discoverable even during system outages. End-to-end monitoring tracks not just classification accuracy but also user engagement metrics like click-through rates on category-filtered searches, validating that classification improvements translate to better discoverability.

Common Challenges and Solutions

Challenge: Insufficient Labeled Training Data

Many organizations lack sufficient labeled training data to train accurate classification models, particularly for specialized domains or newly defined taxonomies. Manual annotation is expensive and time-consuming, creating barriers to implementation 7.

Solution:

Implement a multi-pronged approach combining transfer learning, active learning, and semi-supervised techniques to maximize the value of limited labeled data 1347. Start by fine-tuning pre-trained models that already understand general language patterns, requiring far fewer labeled examples than training from scratch. Deploy active learning workflows that prioritize annotation of the most informative examples, focusing human effort where it provides maximum model improvement. Consider zero-shot or few-shot learning approaches using large language models like GPT-3 that can classify content based on natural language category descriptions with minimal or no labeled examples 4. For example, a specialized legal tech startup lacking labeled training data for classifying patent documents might fine-tune a pre-trained BERT model on 500 actively-selected examples, achieving 82% accuracy—sufficient for initial deployment—then continuously improve the model through active learning as users validate classifications during normal usage, reaching 91% accuracy after six months without dedicated annotation projects.

Challenge: Maintaining Classification Quality as Taxonomies Evolve

Organizations frequently need to add new categories, merge existing ones, or redefine category boundaries as business needs evolve, but these taxonomy changes can degrade classification performance and create inconsistencies in historical metadata 68.

Solution:

Implement versioned taxonomy management with migration strategies that handle transitions gracefully 6. Maintain explicit taxonomy versions with timestamps, and store both the classification result and the taxonomy version used for each content item. When introducing taxonomy changes, develop migration rules that map old categories to new ones (one-to-one mappings for renames, one-to-many for splits, many-to-one for merges). Retrain classification models on data relabeled according to the new taxonomy, and implement a transition period where both old and new classifications are available. For example, an e-commerce platform splitting their "Electronics" category into "Consumer Electronics" and "Professional Electronics" implements a six-month transition: they retrain the classification model on relabeled data, automatically migrate historical product classifications using rule-based mapping (products under $500 → Consumer, products over $500 → Professional, with manual review for edge cases), and maintain both old and new category metadata during the transition to ensure existing customer searches continue working while new faceted navigation reflects the updated taxonomy.

Challenge: Explaining Classification Decisions to Stakeholders

Black-box classification models, particularly deep learning approaches, make decisions that are difficult to explain to non-technical stakeholders, creating trust issues and compliance challenges in regulated industries 8.

Solution:

Implement model explainability techniques that provide human-interpretable insights into classification decisions 8. Use methods like LIME (Local Interpretable Model-agnostic Explanations) or SHAP (SHapley Additive exPlanations) to identify which content features most influenced specific classification decisions. For text classification, highlight the words or phrases that contributed most strongly to category assignments. Develop user interfaces that present explanations alongside classifications, enabling stakeholders to validate model reasoning. For example, a healthcare classification system categorizing clinical notes implements SHAP analysis that highlights the specific medical terms and phrases influencing each diagnosis category prediction. When a note is classified as "diabetes management," the system highlights phrases like "HbA1c levels," "insulin dosage," and "blood glucose monitoring" that drove the decision. Clinical staff can review these explanations to validate that classifications are based on medically relevant information rather than spurious correlations, building trust in the system and identifying cases where model retraining is needed.

Challenge: Handling Multi-Lingual and Cross-Cultural Content

Global organizations must classify content in multiple languages and cultural contexts, but training separate models for each language is resource-intensive and maintaining consistency across languages is challenging 12.

Solution:

Leverage multilingual pre-trained models like mBERT (multilingual BERT) or XLM-RoBERTa that have been trained on text from 100+ languages and can transfer classification knowledge across languages 2. These models enable training on labeled data in one language (typically English, where labeled data is most abundant) and applying the model to content in other languages with reasonable accuracy. For improved performance, fine-tune on multilingual datasets that include examples from all target languages. Implement translation-based augmentation where training data in one language is machine-translated to other languages, expanding the multilingual training set. For example, a global customer support platform needs to classify support tickets in English, Spanish, Mandarin, and Arabic. They fine-tune XLM-RoBERTa on 10,000 labeled English tickets plus 2,000 labeled tickets in each other language (created through a combination of native speaker annotation and translation of English examples). The resulting model achieves 88% accuracy in English, 84% in Spanish, 81% in Mandarin, and 79% in Arabic—sufficient for routing tickets to appropriate support teams across all regions while requiring only 25% of the annotation effort compared to training separate models for each language.

Challenge: Balancing Classification Granularity with Usability

Highly granular taxonomies with many specific categories can improve precision but create usability challenges when users face overwhelming category choices, while coarse taxonomies are easier to navigate but provide less precise classification 6.

Solution:

Implement hierarchical taxonomies that support classification at multiple levels of granularity, enabling different use cases to leverage the appropriate level of detail 6. Design user interfaces that progressively disclose detail, showing broad categories initially with options to drill down into subcategories. Use confidence scores to determine classification granularity—classifying to specific subcategories when confidence is high, but falling back to broader parent categories when confidence is lower. For example, a scientific literature database implements a three-level taxonomy (Discipline → Subdiscipline → Specific Topic) with 10 top-level disciplines, 80 subdisciplines, and 400 specific topics. The classification system predicts all three levels, but the search interface initially displays only discipline-level facets (Biology, Chemistry, Physics, etc.). When users select a discipline, subdiscipline facets appear (for Biology: Molecular Biology, Ecology, Genetics, etc.). Articles are classified to the most specific level where confidence exceeds 0.85, otherwise to the parent category—ensuring users can navigate from general to specific while maintaining classification accuracy. This approach reduces cognitive load while preserving the benefits of granular classification for users who need specific filtering.

References

  1. Devlin, J., Chang, M., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. https://arxiv.org/abs/1810.04805
  2. Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., & Stoyanov, V. (2019). RoBERTa: A Robustly Optimized BERT Pretraining Approach. https://arxiv.org/abs/1907.11692
  3. Lee, J., Yoon, W., Kim, S., Kim, D., Kim, S., So, C. H., & Kang, J. (2019). BioBERT: a pre-trained biomedical language representation model. https://aclanthology.org/N19-1423/
  4. Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., & Amodei, D. (2020). Language Models are Few-Shot Learners. https://arxiv.org/abs/2005.14165
  5. Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., & Liu, P. J. (2020). Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. https://research.google/pubs/pub48842/
  6. Zhang, M. & Zhou, Z. (2014). A Survey on Multi-Label Learning. https://arxiv.org/abs/1506.01497
  7. Ren, P., Xiao, Y., Chang, X., Huang, P., Li, Z., Chen, X., & Wang, X. (2020). Active Learning for Deep Neural Networks: A Survey. https://arxiv.org/abs/1909.05858
  8. Danilevsky, M., Qian, K., Aharonov, R., Katsis, Y., Kawas, B., & Sen, P. (2020). Explainable AI for Text Classification: A Survey. https://arxiv.org/abs/2004.07780
  9. He, X., Pan, J., Jin, O., Xu, T., Liu, B., Xu, T., Shi, Y., Atallah, A., Herbrich, R., Bowers, S., & Candela, J. Q. (2014). Practical Lessons from Predicting Clicks on Ads at Facebook. https://research.google/pubs/pub43146/
  10. Hinton, G., Vinyals, O., & Dean, J. (2015). Distilling the Knowledge in a Neural Network. https://arxiv.org/abs/1503.02531