Methodology
Every line count, speaker ranking, and scene tally on Mitarai Digital Folio comes from a single dialogue corpus drawn from the public-domain canonical text of Shakespeare’s plays.
Data Sources
Primary corpus
The complete dialogue corpus covering 37 canonical plays. Each line is attributed to a named speaker with act and scene numbers preserved from the First Folio and subsequent standard editions. Source: Folger Shakespeare Library.
Metadata (year, genre)
Compose years and canonical genre classification are merged from Open Source Shakespeare, using their WorkID-indexed metadata table. Where OSS gives a date range, we use the midpoint.
Period bucketing
Plays are bucketed into Early (pre-1595), Middle (1595–1603), and Late (1604–1613) based on the compose year merged above.
How We Count
Lines
We count each distinct dialogue row in the corpus as one line. Stage directions and scene headings are excluded.
Characters
Named speakers are indexed per play (so the Hamlet in Hamlet is counted separately from any other Hamlet mentioned elsewhere). A character is given a dedicated page once they speak at least 20 lines.
Scenes
Every unique Act+Scene pair in the corpus gets a dedicated page with its line count and speaker list.