Index of /monthly_releases/2025/pombase-2025-11-01/training_data_for_ML_and_AI
Name Last modified Size Description
Parent Directory -
alleles.tsv 2025-11-01 01:00 1.2M
canto_fypo_annotations_with_comments.tsv 2025-11-01 01:00 31M
canto_go_annotations_with_comments.tsv 2025-11-01 01:00 5.5M
canto_publication_classification.tsv 2025-11-01 01:00 615K
curated_publications.tsv 2025-11-01 01:00 584K
publications_with_annotations.txt 2025-11-01 01:00 77K
This directory contains files compiling data destined to train
Language Models and Artificial Intelligence models for curation
purposes.
- alleles.tsv
Information about curated alleles:
- gene_systematic_id
- gene_name
- allele_current_internal_id - the internal PomBase ID for the allele,
which changes each release
- allele_name
- allele_type
- allele_description
- allele_synonyms
See this recent PomBase publication for details about allele
nomenclature:
https://doi.org/10.1093/genetics/iyad143
- canto_publication_classification.tsv
A TSV file of PubMed ID vs Canto publication triage status
- canto_go_annotations_with_comments.tsv
Manual GO annotations with comments, GAF 2.2 TSV format with a
"comments" column
- canto_fypo_annotations_with_comments.tsv
Manual haploid single locus phenotype annotations with comments,
PHAF TSV format with a "comments" column
- curated_publications.tsv
PubMed ID and title of publications that have been curated by PomBase
- publications_with_annotations.txt is a list of the PubMed
identifiers (PMID) of all publications with annotations in PomBase.
These files are part of PomBase release 2025-11-01
For use of this dataset please cite:
Kim Rutherford, Manuel Lera-RamÃrez, Valerie Wood
PomBase: a Global Core Biodata Resource - growth, collaboration, and sustainability
Genetics, February 2024
https://doi.org/10.1093/genetics/iyae007