Geographic and Localization Factors

Geographic and localization factors in AI citation mechanics represent the computational methods and algorithmic considerations that enable artificial intelligence systems to understand, process, and appropriately weight citations and content based on spatial, linguistic, and cultural contexts. These factors determine how AI systems prioritize and surface citations based on geographical relevance, language-specific patterns, and regional authority signals, ensuring that users across different locations and language communities receive contextually appropriate and culturally sensitive results. In an increasingly globalized yet regionally diverse information ecosystem, the integration of geographic and localization considerations into AI ranking algorithms addresses fundamental challenges in building equitable, accessible, and effective information retrieval systems that serve diverse global populations while respecting local knowledge traditions and research practices.

Overview

The emergence of geographic and localization factors in AI citation mechanics stems from the recognition that traditional citation systems often reflected English-language, Western-centric publication patterns that inadequately served researchers in other regions and languages. As digital scholarship expanded globally, the limitations of location-agnostic citation systems became apparent: researchers in non-English-speaking countries struggled to discover relevant local research, while valuable regional scholarship remained invisible to global audiences. Research in multilingual natural language processing and cross-lingual information retrieval demonstrated that language-specific citation patterns reflect deeper epistemological and methodological differences across research communities, necessitating sophisticated localization approaches beyond simple translation 12.

The fundamental challenge these factors address is balancing global knowledge accessibility with local contextual appropriateness. Citation practices vary significantly across academic traditions, languages, and regions—from author name ordering and date formatting to the relative emphasis on recent versus foundational citations and conventions for citing gray literature 3. Geographic proximity often correlates with citation relevance, particularly for region-specific research topics like local public health interventions, regional environmental studies, or country-specific legal scholarship. However, overly aggressive localization risks creating regional filter bubbles that limit exposure to global research frontiers.

The practice has evolved considerably with advances in multilingual AI models and geospatial data processing. Early approaches relied on simple language detection and IP-based location filtering, but modern systems employ sophisticated cross-lingual embeddings, cultural context models, and hybrid ranking frameworks that dynamically balance global and local relevance signals 13. Contemporary implementations leverage transformer-based architectures adapted for spatial coordinates and multilingual citation corpora, enabling AI systems to learn nuanced location-dependent relevance patterns while maintaining awareness of cross-regional knowledge connections.

Key Concepts

Geospatial Relevance

Geospatial relevance refers to the physical location relationship between content, citations, and users, and how this spatial proximity influences citation appropriateness and ranking 1. This concept recognizes that geographic distance often correlates with research relevance, particularly for location-dependent topics, and that users frequently benefit from prioritizing sources from their region or country.

Example: A public health researcher at the University of Nairobi searching for malaria prevention strategies would benefit from geospatial relevance factors that prioritize citations from East African research institutions and publications in regional journals like the East African Medical Journal. While global malaria research from institutions like the WHO remains relevant, studies conducted in similar geographic and climatic conditions—such as research from Kenya, Tanzania, and Uganda—provide more directly applicable insights about local vector species, seasonal transmission patterns, and community-specific intervention challenges. The AI system uses institutional affiliation metadata and publication venue locations to boost these regionally-relevant citations in ranking.

Linguistic Localization

Linguistic localization encompasses language-specific processing and citation patterns, including language detection, cross-lingual entity recognition, and accommodation of language-specific citation format conventions 23. This concept extends beyond translation to recognize that different linguistic communities have distinct citation practices, bibliographic structures, and scholarly communication norms.

Example: A researcher in Japan searching for earthquake engineering literature benefits from linguistic localization that recognizes Japanese citation conventions, where family names precede given names and dates follow different formatting patterns. The AI system processes citations from Japanese journals like 日本建築学会構造系論文集 (Journal of Structural and Construction Engineering), correctly parsing author names in their original order rather than forcing Western name conventions. Additionally, the system identifies semantically equivalent citations across Japanese and English publications, recognizing that the same research may be published in both languages with different citation formats, enabling comprehensive discovery across linguistic boundaries.

Regional Authority Signals

Regional authority signals are location-specific indicators of source credibility, including regional journal impact factors, country-specific institutional rankings, and local citation network analysis 1. This concept acknowledges that traditional global citation metrics may undervalue sources with high influence within specific geographic or linguistic communities but lower international visibility.

Example: A legal scholar in Brazil researching constitutional law would benefit from regional authority signals that recognize the Revista de Direito Administrativo as a highly authoritative source within Brazilian legal scholarship, even though it has lower global citation counts than English-language international law journals. The AI system analyzes citation patterns within the Brazilian legal research community, identifying that this journal is frequently cited by Supreme Court decisions and leading Brazilian legal scholars. By incorporating these regional authority signals, the system appropriately weights this source alongside or above globally-prominent journals that may be less relevant to Brazilian constitutional questions.

Cultural Contextualization

Cultural contextualization involves capturing region-specific knowledge frameworks, research priorities, and methodological preferences that influence citation relevance 3. This concept recognizes that different academic traditions emphasize different types of evidence, methodological approaches, and theoretical frameworks, requiring AI systems to understand these cultural variations.

Example: A medical researcher in Germany investigating homeopathic treatments encounters cultural contextualization that recognizes the distinct status of complementary medicine in German healthcare and research traditions. While Anglo-American medical databases might heavily weight randomized controlled trials and systematic reviews that are often skeptical of homeopathy, the German research context includes a substantial body of observational studies and practice-based evidence published in journals like Forschende Komplementärmedizin. The AI system recognizes these methodological differences and surfaces culturally-appropriate sources that reflect German medical practice patterns, while still providing access to international systematic reviews for comparative context.

Multilingual Citation Networks

Multilingual citation networks are interconnected citation graphs that span language boundaries, capturing how research in different languages cites and builds upon work across linguistic communities 2. This concept enables AI systems to identify cross-language citation relationships and facilitate knowledge transfer between linguistic research communities.

Example: A climate scientist in France researching Mediterranean drought patterns benefits from multilingual citation networks that connect French-language publications in La Météorologie with Spanish research in Revista de Climatología, Italian studies in Italian Journal of Agrometeorology, and English-language international journals. The AI system identifies that a seminal French study on North African weather patterns is cited by Spanish researchers studying Iberian drought, even though the citation crosses language boundaries. By mapping these multilingual citation relationships, the system surfaces relevant research that might be missed by monolingual search approaches, revealing how Mediterranean climate research builds cumulatively across linguistic communities.

Locale-Aware Ranking

Locale-aware ranking involves adjusting result ordering based on user location, language preferences, and regional context, dynamically balancing global relevance with local appropriateness 13. This concept represents the algorithmic implementation of geographic and localization factors in the final presentation of citation results.

Example: Two researchers searching for "diabetes management guidelines"—one at the University of Toronto and another at the University of Melbourne—receive different citation rankings due to locale-aware ranking. The Canadian researcher sees Diabetes Canada's clinical practice guidelines and citations from the Canadian Journal of Diabetes ranked prominently, reflecting Canadian healthcare system specifics and medication availability. The Australian researcher receives prioritized citations from the Australian Diabetes Society and research published in Medical Journal of Australia, reflecting Australia's Pharmaceutical Benefits Scheme and primary care structure. Both researchers also see international guidelines from the American Diabetes Association and WHO, but with ranking adjustments that prioritize locally-applicable sources while maintaining global awareness.

Cross-Regional Citation Bridging

Cross-regional citation bridging identifies and appropriately weights citations that connect different geographic or linguistic research communities, facilitating knowledge transfer while respecting local contexts 2. This concept addresses the challenge of preventing regional filter bubbles while still providing localization benefits.

Example: A computer science researcher in India working on natural language processing for low-resource languages benefits from cross-regional citation bridging that connects Indian research on Hindi and Tamil language processing with similar work on other low-resource languages worldwide. The AI system identifies that techniques developed for Swahili in East Africa, Vietnamese in Southeast Asia, and indigenous languages in Latin America share methodological similarities and face comparable challenges. By surfacing these cross-regional connections, the system enables the researcher to learn from parallel efforts in other regions while still prioritizing locally-relevant work on Indian languages, creating productive knowledge exchange across geographic boundaries.

Applications in Academic Research and Information Retrieval

Geographic and localization factors find application across multiple phases of academic research and information retrieval, enhancing discovery, relevance, and accessibility for diverse global user populations.

Regional Literature Discovery: Researchers conducting systematic reviews or literature surveys benefit from geographic factors that ensure comprehensive coverage across regions and languages. A systematic review of COVID-19 vaccine hesitancy conducted by a European research team uses localization factors to identify relevant studies from Africa, Asia, and Latin America that might be published in regional journals with lower international visibility 1. The AI system surfaces Portuguese-language studies from Brazil published in Cadernos de Saúde Pública, Chinese research from regional health departments, and African studies from country-specific medical journals, ensuring the review captures regional variations in vaccine attitudes rather than reflecting only English-language, Western-centric research.

Cross-Linguistic Citation Recommendation: Multilingual researchers benefit from systems that recommend citations across language boundaries while respecting linguistic preferences. A bilingual researcher in Quebec working on educational policy receives citation recommendations that include both French-language research from France, Belgium, and francophone Africa, and English-language international studies 2. The AI system uses cross-lingual embeddings to identify semantically similar research across languages, recommending a French study on competency-based education from the Revue française de pédagogie alongside an English study with similar findings from Educational Researcher, enabling the researcher to cite the most appropriate source for their intended publication venue and audience.

Location-Specific Research Applications: Researchers addressing location-specific problems benefit from geographic ranking that prioritizes locally-applicable sources. An agricultural scientist in Thailand researching rice pest management receives prioritized citations from Southeast Asian agricultural research institutions, Thai-language publications from Kasetsart University, and studies conducted in similar tropical monsoon climates 3. While the system also surfaces relevant research from the International Rice Research Institute and global entomology journals, the geographic ranking ensures that studies reflecting Thailand's specific pest species, rice varieties, and farming practices appear prominently, providing immediately actionable insights.

Global Knowledge Synthesis with Local Grounding: International organizations and policy makers benefit from systems that synthesize global research while maintaining awareness of regional variations. A WHO team developing global tuberculosis treatment guidelines uses an AI citation system that identifies both globally-influential research and region-specific studies from high-burden countries 12. The system surfaces research from India, China, South Africa, and other high-TB-burden countries published in regional journals, ensuring that global guidelines reflect diverse epidemiological contexts, drug resistance patterns, and healthcare delivery systems rather than being dominated by research from low-burden Western countries.

Best Practices

Prioritize Explicit User Preferences Over Inferred Signals

AI systems should prioritize explicit user-specified language settings and location preferences over inferred signals from IP addresses or browser locales 3. This approach respects user agency, accommodates researchers working across regions or languages, and reduces privacy concerns associated with location tracking.

Rationale: Explicit preferences are more reliable and respectful of user intent than inferred signals. A researcher traveling internationally, accessing resources through VPNs, or working on cross-regional research may have location preferences that differ from their current IP address. Additionally, explicit preference collection provides transparency about how localization affects results.

Implementation Example: An institutional repository system at a multinational university implements a preference dashboard where researchers explicitly select their primary research regions (e.g., "Southeast Asia," "European Union"), preferred citation languages (e.g., "English, Mandarin"), and localization intensity (e.g., "balanced," "strongly local," "global focus"). These preferences persist across sessions and override IP-based location detection. The system also provides a temporary "search as if from" feature that allows researchers to explore how results would appear for users in different regions, facilitating cross-regional research awareness.

Maintain Geographic Diversity in Result Sets

Systems should ensure that localization boosts don't completely suppress globally-relevant sources or create regional filter bubbles 13. Best practice involves setting minimum thresholds for geographic diversity in top-ranked results and providing transparency about localization effects.

Rationale: While localization enhances relevance, excessive regional filtering can limit researchers' exposure to important global research, reinforce existing geographic inequalities in citation networks, and disadvantage researchers in regions with smaller research outputs. Maintaining diversity ensures access to both local and global knowledge.

Implementation Example: A citation recommendation system implements a "diversity-aware ranking" algorithm that ensures the top 20 results include citations from at least three different geographic regions and two different language communities, even when strong localization preferences are set. The system displays subtle geographic indicators (small flag icons or region labels) next to citations and provides a "geographic distribution" visualization showing the regional composition of results. Users can adjust a slider to control the balance between local relevance and geographic diversity, with the default setting ensuring meaningful representation of both local and global sources.

Conduct Region-Specific Evaluation and Bias Auditing

Organizations should regularly audit geographic citation distributions and conduct region-specific user studies to validate that localization improves rather than degrades user experience 1. Evaluation methodologies must extend beyond traditional relevance metrics to assess localization quality and detect geographic biases.

Rationale: Geographic and localization factors can inadvertently amplify existing inequalities or introduce new biases. Regular auditing identifies whether localization benefits all user populations equitably or disadvantages researchers from certain regions. Region-specific evaluation ensures that system improvements validated in one geographic context actually benefit users globally.

Implementation Example: A research institution implements quarterly "geographic equity audits" of their AI citation system, analyzing citation distributions by region, language, and institutional prestige. The audit measures whether researchers from Africa, Latin America, and Asia receive comparably relevant results to those from North America and Europe, whether non-English language sources appear appropriately in multilingual searches, and whether localization features inadvertently reduce citation diversity. The institution conducts parallel user studies in five different countries, recruiting researchers to evaluate result quality using region-specific relevance criteria, ensuring that system refinements benefit diverse global user populations.

Provide Transparency and User Control Over Localization

Systems should clearly communicate how geographic factors influence rankings and provide users with controls to adjust localization intensity 3. Transparency helps users understand and contextualize results, while control accommodates diverse research needs and preferences.

Rationale: Researchers have varying needs for localization depending on their research questions, publication goals, and disciplinary norms. Transparency about localization effects enables informed interpretation of results, while user controls accommodate researchers conducting purely local research, global comparative studies, or anything in between.

Implementation Example: A scholarly search engine implements a "localization transparency panel" that appears alongside search results, explaining: "Results are personalized for researchers in Germany and prioritize German-language sources and European research. 15 of the top 20 results are from European institutions." The panel includes an adjustable "localization strength" slider with settings from "local focus" to "global view," and a "show me why" feature that highlights which results were boosted or demoted due to geographic factors. Users can save different localization profiles for different research projects, such as "German constitutional law" (high localization) versus "international climate science" (low localization).

Implementation Considerations

Geospatial Data Infrastructure and Quality

Implementing geographic factors requires robust geospatial data infrastructure, including geographic metadata extraction, institutional affiliation mapping, and spatial indexing systems 1. Organizations must address data quality challenges, as geographic metadata is often incomplete, inconsistent, or absent, particularly for older publications or sources from regions with less developed digital infrastructure.

Example: A university library implementing geographic citation features develops a multi-tiered approach to geographic metadata: (1) extracting explicit location tags from publisher-provided metadata; (2) inferring locations from author institutional affiliations using a curated database mapping institutions to coordinates; (3) analyzing publication venue locations for journals and conferences; (4) applying natural language processing to extract geographic entities from abstracts and titles. The system uses hierarchical geographic representations (city → region → country → continent) stored in spatial indexing structures like R-trees, enabling efficient location-based queries. For sources lacking geographic metadata, the system displays uncertainty indicators and allows community contributions to improve coverage.

Multilingual Processing Pipeline Architecture

Supporting linguistic localization requires multilingual processing pipelines that handle language detection, cross-lingual entity recognition, and language-specific citation parsing 2. Organizations must balance comprehensive language support with computational efficiency and maintenance overhead.

Example: A research institution implements a tiered multilingual architecture supporting three levels: (1) full processing for 10 high-priority languages (English, Mandarin, Spanish, Arabic, French, German, Portuguese, Russian, Japanese, Hindi) with language-specific tokenizers, named entity recognizers, and citation parsers; (2) basic processing for 40 additional languages using multilingual BERT models and rule-based citation parsing; (3) fallback processing for all other languages using language-agnostic features like citation network structure. The system employs caching strategies to avoid re-processing multilingual content and uses transfer learning to efficiently add new languages by fine-tuning existing multilingual models rather than training from scratch.

Privacy-Preserving Location Detection

Implementing geographic personalization while respecting user privacy requires careful design of location detection mechanisms and data handling practices 3. Organizations must comply with regional regulations like GDPR while still providing localization benefits.

Example: A scholarly platform implements privacy-preserving localization through multiple mechanisms: (1) prioritizing explicit user preferences stored locally in browser settings rather than server-side tracking; (2) using coarse-grained location detection (country or region level) rather than precise coordinates; (3) implementing differential privacy techniques that add noise to location-based query logs; (4) providing clear opt-out mechanisms and "private browsing" modes that disable all location-based personalization; (5) storing location data separately from user identifiers with automatic deletion after 90 days. The system's privacy policy clearly explains what location data is collected, how it's used, and provides users with data export and deletion tools.

Evaluation Frameworks for Localization Quality

Assessing the effectiveness of geographic and localization factors requires evaluation frameworks that extend beyond traditional relevance metrics 13. Organizations need methodologies for measuring geographic diversity, cross-lingual discovery effectiveness, and region-specific user satisfaction.

Example: A research organization develops a comprehensive localization evaluation framework including: (1) geographic diversity metrics measuring the distribution of result sets across regions and languages; (2) cross-lingual retrieval effectiveness tests using parallel citation corpora in multiple languages; (3) region-stratified A/B testing that separately analyzes localization effects for users in different countries; (4) qualitative user studies with researchers from diverse geographic and linguistic backgrounds evaluating result appropriateness; (5) bias detection analyses identifying whether localization disadvantages certain regions or languages. The framework includes both automated metrics computed continuously and periodic human evaluation studies, with results informing iterative system refinements.

Common Challenges and Solutions

Challenge: Incomplete or Inconsistent Geographic Metadata

Geographic metadata for academic publications is frequently incomplete, inconsistent, or entirely absent, particularly for older publications, preprints, and sources from regions with less developed digital infrastructure. Author affiliations may be ambiguous (multiple institutions, historical name changes), publication venues may lack clear geographic associations, and explicit location tags are rare. This metadata scarcity undermines the effectiveness of geographic ranking factors and creates coverage gaps that may systematically disadvantage certain regions.

Solution:

Implement multi-source metadata enrichment strategies that combine explicit metadata with inferred geographic signals 1. Develop institutional affiliation databases that map universities and research organizations to geographic coordinates, handling name variations and historical changes. Use natural language processing to extract geographic entities from article content, abstracts, and acknowledgments. Employ citation network analysis to infer likely geographic associations based on citation patterns—if a paper is heavily cited by researchers from a particular region, it likely has regional relevance. Create community contribution mechanisms allowing researchers to add or correct geographic metadata. Importantly, represent uncertainty explicitly: when geographic associations are inferred rather than explicit, display confidence indicators to users and weight these signals appropriately in ranking algorithms. For sources lacking any geographic metadata, ensure they remain discoverable through non-geographic relevance signals rather than being systematically suppressed.

Challenge: Balancing Local Relevance with Global Knowledge Access

Overly aggressive localization can create regional filter bubbles that limit researchers' exposure to important global research, potentially reinforcing existing geographic inequalities in citation networks and disadvantaging researchers in regions with smaller research outputs 3. Conversely, insufficient localization fails to surface valuable region-specific sources, forcing users to manually filter irrelevant results. Finding the appropriate balance is particularly challenging because optimal localization intensity varies by discipline, research question, and user preference.

Solution:

Implement adaptive, context-aware localization that adjusts based on query characteristics, user preferences, and content availability 13. For queries with clear local intent (e.g., "German tax law," "Tokyo earthquake preparedness"), apply stronger localization. For queries with inherently global scope (e.g., "quantum computing algorithms," "dark matter detection"), reduce localization intensity. Provide user controls allowing researchers to adjust localization strength for different research projects. Ensure minimum geographic diversity thresholds in top results—for example, guaranteeing that the top 20 results include sources from at least three different regions. Implement "cross-regional bridging" features that explicitly highlight valuable connections between local and global research, such as "This local study builds on international research from..." or "Global researchers have cited this regional source for..." Display geographic distribution visualizations helping users understand the regional composition of their results and identify potential gaps.

Challenge: Cross-Lingual Citation Discovery and Equivalence

Identifying semantically equivalent or related citations across language boundaries presents significant technical challenges 2. The same research may be published in multiple languages with different citation formats, author name conventions, and bibliographic structures. Cross-lingual entity recognition must handle script variations, transliteration inconsistencies, and cultural naming conventions. Simple translation-based approaches often fail to capture nuanced semantic relationships or domain-specific terminology across languages.

Solution:

Develop multilingual citation embeddings that represent citations in a unified vector space where semantically similar citations in different languages occupy proximate positions 2. Train these embeddings on parallel citation corpora, multilingual abstract collections, and cross-lingual citation networks. Use transformer-based multilingual models (like mBERT or XLM-R) fine-tuned on citation data to capture semantic relationships across languages. Implement cross-lingual entity linking that maps author names, institutions, and concepts to language-independent identifiers, handling transliteration variations and cultural naming conventions. Create language-specific citation parsers that correctly handle bibliographic format variations across linguistic communities. Build cross-lingual citation networks that explicitly represent citation relationships spanning language boundaries, enabling discovery of research that cites or is cited by work in other languages. Provide users with "find similar citations in other languages" features and display cross-lingual citation relationships transparently.

Challenge: Regional Authority Assessment Without Bias

Traditional citation metrics like journal impact factors and h-indices often reflect English-language, Western-centric publication patterns, potentially undervaluing sources with high influence within specific geographic or linguistic communities but lower international visibility 1. However, implementing regional authority signals risks introducing new biases, such as over-weighting low-quality sources simply because they're local, or creating fragmented regional citation ecosystems with limited cross-regional knowledge exchange.

Solution:

Develop multi-dimensional authority assessment that combines global and regional signals while maintaining quality standards 13. Compute region-specific citation metrics that measure influence within particular geographic or linguistic communities—for example, a journal's impact factor among Latin American researchers or within Portuguese-language publications. Use network analysis to identify regionally-authoritative sources based on citation patterns within specific communities. Implement quality thresholds that prevent low-quality sources from being boosted solely due to local origin—regional authority signals should elevate high-quality regional sources, not lower quality standards. Provide transparency about authority assessment, displaying both global and regional metrics to users. Conduct regular bias audits examining whether regional authority signals appropriately recognize diverse sources or inadvertently introduce new inequalities. Create cross-regional authority bridges that recognize when regionally-prominent sources gain international influence or when globally-prominent sources are particularly relevant to specific regions.

Challenge: Geopolitical Sensitivity and Content Restrictions

Geographic localization must navigate geopolitical sensitivities including disputed territories, country naming conventions, and content restrictions in certain jurisdictions 3. Different regions may have conflicting claims about territorial boundaries, preferred country names, or the legitimacy of certain sources. Some jurisdictions impose legal restrictions on accessible content. Handling these sensitivities poorly can alienate users, create legal liabilities, or compromise the system's perceived neutrality and credibility.

Solution:

Implement region-aware content policies that respect local sensitivities while maintaining scholarly integrity. For disputed territories, use neutral geographic descriptors or display multiple naming conventions based on user location (similar to how mapping services handle disputed borders). Maintain clear editorial policies about source inclusion based on scholarly criteria rather than political considerations, while complying with legal requirements in operating jurisdictions. Provide transparency about any content restrictions, clearly indicating when results are filtered due to legal requirements. For country naming, use internationally-recognized standards (like ISO country codes) as canonical identifiers while displaying culturally-appropriate names based on user locale. Establish advisory boards with international representation to guide policies on geopolitically-sensitive issues. Implement user controls allowing researchers to access alternative geographic perspectives when conducting cross-regional research. Document all geopolitical policy decisions transparently and review them regularly to ensure they serve scholarly communication rather than political agendas.

References

  1. arXiv. (2020). Cross-lingual Retrieval for Iterative Self-Supervised Training. https://arxiv.org/abs/2004.14516
  2. arXiv. (2020). Multilingual Knowledge Graph Embeddings for Cross-lingual Knowledge Alignment. https://arxiv.org/abs/2010.11934
  3. arXiv. (2019). Cross-lingual Language Model Pretraining. https://arxiv.org/abs/1911.02116
  4. ACL Anthology. (2020). Emerging Cross-lingual Structure in Pretrained Language Models. https://aclanthology.org/2020.acl-main.747/
  5. ACL Anthology. (2021). Multilingual Transfer Learning for QA Using Translation as Data Augmentation. https://aclanthology.org/2021.findings-acl.144/
  6. Google Research. (2020). Massively Multilingual Neural Machine Translation in the Wild. https://research.google/pubs/pub46808/
  7. arXiv. (2021). Cross-lingual Retrieval Augmented Prompt for Low-Resource Languages. https://arxiv.org/abs/2103.10360
  8. arXiv. (2020). XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization. https://arxiv.org/abs/2005.00052
  9. ACL Anthology. (2020). From Zero to Hero: On the Limitations of Zero-Shot Cross-Lingual Transfer with Multilingual Transformers. https://aclanthology.org/2020.emnlp-main.363/
  10. NeurIPS. (2020). Unsupervised Cross-lingual Representation Learning at Scale. https://proceedings.neurips.cc/paper/2020/hash/92d1e1eb1cd6f9fba3227870bb6d7f07-Abstract.html
  11. arXiv. (2021). Cross-lingual Retrieval for Iterative Self-Supervised Training. https://arxiv.org/abs/2112.10668