Index of /monthly_releases/2026/pombase-2026-02-01/training_data_for_ML_and_AI

Icon  Name                                         Last modified      Size  Description
[PARENTDIR] Parent Directory - [TXT] alleles.tsv 2026-02-04 18:15 916K [   ] canto_fypo_annotations_with_comments.parquet 2026-02-01 01:00 1.5M [TXT] canto_fypo_annotations_with_comments.tsv 2026-02-01 01:00 31M [   ] canto_go_annotations_with_comments.parquet 2026-02-01 01:00 400K [TXT] canto_go_annotations_with_comments.tsv 2026-02-01 01:00 5.4M [TXT] canto_publication_classification.tsv 2026-02-01 01:00 618K [TXT] curated_publications.tsv 2026-02-01 01:00 585K [TXT] publications_with_annotations.txt 2026-02-01 01:00 77K
This directory contains files compiling data destined to train
Language Models and Artificial Intelligence models for curation
purposes.

 - alleles.tsv
   Information about curated alleles:
    - gene_systematic_id
    - gene_name
    - allele_current_internal_id - the internal PomBase ID for the allele,
                                   which changes each release
    - allele_name
    - allele_type
    - allele_description
    - allele_synonyms
   See this recent PomBase publication for details about allele
   nomenclature:
     https://doi.org/10.1093/genetics/iyad143

 - canto_publication_classification.tsv
   A TSV file of PubMed ID vs Canto publication triage status

 - canto_go_annotations_with_comments.tsv
   Manual GO annotations with comments, GAF 2.2 TSV format with a
   "comments" column
 - canto_go_annotations_with_comments.parquet
   The same information in Parquet format

 - canto_fypo_annotations_with_comments.tsv
   Manual haploid single locus phenotype annotations with comments,
   PHAF TSV format with a "comments" column
 - canto_fypo_annotations_with_comments.parquet
   The same information in Parquet format

 - curated_publications.tsv
   PubMed ID and title of publications that have been curated by PomBase

 - publications_with_annotations.txt is a list of the PubMed
   identifiers (PMID) of all publications with annotations in PomBase.

These files are part of PomBase release 2026-02-01

For use of this dataset please cite:
  Pascal Carme, Kim Rutherford, Jürg Bähler, Juan Mata, Valerie Wood
  PomBase in 2026: Expanding Knowledge, Modelling Connections
  Genetics, January 2026
  https://doi.org/10.1093/genetics/iyag001