Dataset Torrent

Dataset Card for PaintBerri

Dataset containing PaintBerri artwork archives.

  • image-classification
  • image-to-text
  • artwork
  • English

Dataset Summary

This dataset contains hand-drawn artwork collected from PaintBerri. The dataset includes images along with associated metadata such as publication dates, titles, descriptions, and dimensions.

Languages

The dataset is monolingual. All image descriptions and metadata are primarily in English.

Dataset Structure

Data Files

The dataset consists of image files stored across multiple ZIP files, corresponding metadata in JSONL format, and an archive index CSV file mapping image IDs to their respective archive files.

Data Fields

Fields include image URL, publication timestamp, modification timestamp, description, title, not-safe-for-work flag, thumbnail URL, height, width, creator ID, and short ID.

Data Splits

All images and metadata are in a single split with 68,860 entries.