cheesechaser.datapool.danbooru

This module provides data pool classes for managing and accessing Danbooru image datasets.

It includes classes for handling both original and WebP versions of Danbooru images, as well as classes for accessing the newest additions to the dataset. The module utilizes Hugging Face’s file system for data storage and retrieval.

Classes:

  • DanbooruDataPool: Main class for accessing Danbooru image data.

  • DanbooruStableDataPool: Class for accessing a stable version of Danbooru data.

  • DanbooruNewestDataPool: Class for accessing both stable and newest Danbooru data.

  • DanbooruWebpDataPool: Class for accessing WebP versions of Danbooru images.

  • DanbooruNewestWebpDataPool: Class for accessing both stable and newest WebP Danbooru data.

  • Danbooru2024SfwDataPool: Class for accessing the SFW version of Danbooru2024 dataset.

  • Danbooru2024DataPool: Class for accessing the full Danbooru2024 dataset.

  • Danbooru2024WebpDataPool: Class for accessing WebP versions of Danbooru2024 images.

The module uses various utility functions and classes from the hfutils package for interacting with the Hugging Face file system.

Warning

The original Danbooru series datasets are no longer publicly available. Unless you are an official member of DeepGHS, you will not be able to continue using these DataPools. Please use the Danbooru2024 series datasets instead.

Note

The deepghs/danbooru2024-sfw is fully public, while the deepghs/danbooru2024 and deepghs/danbooru2024-webp-4Mpixel datasets require granted access. Please ensure you have the necessary permissions before attempting to use these datasets.

DanbooruDataPool

class cheesechaser.datapool.danbooru.DanbooruDataPool(data_revision: str = 'main', idx_revision: str = 'main', hf_token: str | None = None)[source]

Main class for accessing Danbooru image data.

This class provides access to the Danbooru dataset using default repository IDs.

Parameters:
  • data_revision (str) – The revision of the data repository to use. Defaults to ‘main’.

  • idx_revision (str) – The revision of the index repository to use. Defaults to ‘main’.

  • hf_token (Optional[str]) – Optional Hugging Face token for authentication.

__init__(data_revision: str = 'main', idx_revision: str = 'main', hf_token: str | None = None)[source]

DanbooruStableDataPool

class cheesechaser.datapool.danbooru.DanbooruStableDataPool(hf_token: str | None = None)[source]

Class for accessing a stable version of Danbooru data.

This class uses specific revisions of the data and index repositories to provide access to a stable version of the Danbooru dataset.

Parameters:

hf_token (Optional[str]) – Optional Hugging Face token for authentication.

__init__(hf_token: str | None = None)[source]

DanbooruNewestDataPool

class cheesechaser.datapool.danbooru.DanbooruNewestDataPool(hf_token: str | None = None)[source]

Class for accessing both stable and newest Danbooru data.

This class combines access to both the stable Danbooru dataset and the newest additions, providing a comprehensive view of the data.

Parameters:

hf_token (Optional[str]) – Optional Hugging Face token for authentication.

__init__(hf_token: str | None = None)[source]
mock_resource(resource_id, resource_info, silent: bool = False) AbstractContextManager[Tuple[str, Any]][source]

Provide a context manager for accessing a resource.

This method attempts to retrieve the resource from both the stable and newest data pools, returning the first successful match.

Parameters:
  • resource_id (Any) – The ID of the resource to retrieve.

  • resource_info (Any) – Additional information about the resource.

  • silent (bool) – If True, suppresses progress bar of each standalone files during the mocking process.

Returns:

A context manager yielding a tuple of (temporary directory, resource info).

Return type:

ContextManager[Tuple[str, Any]]

Raises:

ResourceNotFoundError – If the resource is not found in either pool.

DanbooruWebpDataPool

class cheesechaser.datapool.danbooru.DanbooruWebpDataPool(data_revision: str = 'main', idx_revision: str = 'main', hf_token: str | None = None)[source]

Class for accessing WebP versions of Danbooru images.

This class provides access to WebP-formatted Danbooru images, which are typically smaller in file size while maintaining good quality.

Parameters:
  • data_revision (str) – The revision of the data repository to use. Defaults to ‘main’.

  • idx_revision (str) – The revision of the index repository to use. Defaults to ‘main’.

  • hf_token (Optional[str]) – Optional Hugging Face token for authentication.

__init__(data_revision: str = 'main', idx_revision: str = 'main', hf_token: str | None = None)[source]

DanbooruNewestWebpDataPool

class cheesechaser.datapool.danbooru.DanbooruNewestWebpDataPool(hf_token: str | None = None)[source]

Class for accessing both stable and newest WebP Danbooru data.

This class combines access to both the stable WebP-formatted Danbooru dataset and the newest WebP additions, providing a comprehensive view of the WebP data.

Parameters:

hf_token (Optional[str]) – Optional Hugging Face token for authentication.

__init__(hf_token: str | None = None)[source]
mock_resource(resource_id, resource_info, silent: bool = False) AbstractContextManager[Tuple[str, Any]][source]

Provide a context manager for accessing a WebP resource.

This method attempts to retrieve the WebP resource from both the stable and newest WebP data pools, returning the first successful match.

Parameters:
  • resource_id (Any) – The ID of the resource to retrieve.

  • resource_info (Any) – Additional information about the resource.

  • silent (bool) – If True, suppresses progress bar of each standalone files during the mocking process.

Returns:

A context manager yielding a tuple of (temporary directory, resource info).

Return type:

ContextManager[Tuple[str, Any]]

Raises:

ResourceNotFoundError – If the resource is not found in either WebP pool.

Danbooru2024DataPool

class cheesechaser.datapool.danbooru.Danbooru2024DataPool(revision: str = 'main', hf_token: str | None = None)[source]

A data pool for accessing the full version of the Danbooru2024 dataset.

This class provides an interface to interact with the complete Danbooru2024 dataset hosted on Hugging Face. It uses incremental IDs for efficient data access.

Parameters:
  • revision (str) – The specific revision of the dataset to use, defaults to ‘main’.

  • hf_token (Optional[str]) – The Hugging Face API token for authentication, defaults to None.

Usage:
>>> full_pool = Danbooru2024DataPool(revision='v2.0', hf_token='your_token_here')
>>> # Now you can use full_pool to access the complete Danbooru2024 dataset
__init__(revision: str = 'main', hf_token: str | None = None)[source]

Danbooru2024SfwDataPool

class cheesechaser.datapool.danbooru.Danbooru2024SfwDataPool(revision: str = 'main', hf_token: str | None = None)[source]

A data pool for accessing the Safe For Work (SFW) version of the Danbooru2024 dataset.

This class provides an interface to interact with the SFW Danbooru2024 dataset hosted on Hugging Face. It uses incremental IDs for efficient data access.

Parameters:
  • revision (str) – The specific revision of the dataset to use, defaults to ‘main’.

  • hf_token (Optional[str]) – The Hugging Face API token for authentication, defaults to None.

Usage:
>>> sfw_pool = Danbooru2024SfwDataPool(revision='v1.2', hf_token='your_token_here')
>>> # Now you can use sfw_pool to access the SFW Danbooru2024 dataset
__init__(revision: str = 'main', hf_token: str | None = None)[source]

Danbooru2024WebpDataPool

class cheesechaser.datapool.danbooru.Danbooru2024WebpDataPool(revision: str = 'main', hf_token: str | None = None)[source]

A data pool for accessing the WebP-formatted version of the Danbooru2024 dataset.

This class provides an interface to interact with the WebP-formatted Danbooru2024 dataset hosted on Hugging Face. It uses incremental IDs for efficient data access. The WebP format offers improved compression and potentially faster loading times for images.

Parameters:
  • revision (str) – The specific revision of the dataset to use, defaults to ‘main’.

  • hf_token (Optional[str]) – The Hugging Face API token for authentication, defaults to None.

Usage:
>>> webp_pool = Danbooru2024WebpDataPool(hf_token='your_token_here')
>>> # Now you can use webp_pool to access the WebP-formatted Danbooru2024 dataset
__init__(revision: str = 'main', hf_token: str | None = None)[source]