cheesechaser.datapool.fancaps

This module provides a data pool implementation for the Fancaps dataset.

It includes a class FancapsDataPool which extends the IncrementIDDataPool base class. The module is designed to facilitate access to the Fancaps dataset, which is hosted on the Hugging Face Hub.

The constant _FANCAPS_REPO defines the repository ID for the Fancaps dataset.

Note

The dataset deepghs/fancaps_full is gated, you have to get the access of it before using this module.

FancapsDataPool

class cheesechaser.datapool.fancaps.FancapsDataPool(revision: str = 'main', hf_token: str | None = None)[source]

A data pool class for accessing and managing the Fancaps dataset.

This class extends the IncrementIDDataPool base class and is specifically tailored for the Fancaps dataset. It provides an interface to access the dataset stored in the Hugging Face repository.

Parameters:
  • revision (str) – The specific revision of the dataset to use, defaults to ‘main’.

  • hf_token (Optional[str]) – Optional Hugging Face authentication token for accessing private repositories.

Usage:
>>> fancaps_pool = FancapsDataPool()
>>> fancaps_pool_with_token = FancapsDataPool(hf_token='your_hf_token_here')

Note

The Fancaps dataset is stored in the repository defined by _FANCAPS_REPO. Both the data and index are stored in the same repository.

__init__(revision: str = 'main', hf_token: str | None = None)[source]

Initialize the FancapsDataPool.

Parameters:
  • revision (str) – The specific revision of the dataset to use, defaults to ‘main’.

  • hf_token (Optional[str]) – Optional Hugging Face authentication token for accessing private repositories.