GiantMIDI-Piano is a large-scale symbolic classical piano music dataset built by applying the piano_transcription system on a vast collection of piano performance recordings. The dataset contains thousands of piano works, spanning a large number of composers and styles, with each piece transcribed into high-precision MIDI files capturing note events, pedal usage, velocities, etc. It provides a resource for music information retrieval (MIR), symbolic music modeling, composer classification, music generation, analysis of classical piano repertoire, and data-driven research in musicology or AI-based composition. Because the dataset is machine-generated via an automated transcription pipeline, it offers consistency, scale, and accessibility that would be difficult to achieve manually — enabling researchers to work with large corpora of piano music without copyright restrictions on symbolic data.
Features
- Large-scale dataset: thousands of piano works, many composers, covering classical repertoire and live recordings
- High-resolution MIDI transcription including note onsets/offsets, velocities, and pedal usage — capturing expressive performance details
- Ready-to-use symbolic piano data — ideal for research in music information retrieval, analysis, machine learning, composer classification, or generation tasks
- Curated subset available — allowing users to use more reliably transcribed or metadata-consistent pieces depending on needs
- Free and open symbolic data (subject to dataset license/disclaimer) — enabling broad reuse in academic, creative, or commercial contexts
- Useful baseline and benchmark for symbolic music modeling tasks (training neural networks, style transfer, composer classification, data-driven composition) — thanks to size, diversity, and quality of transcriptions