Mitarai Digital Folio

Methodology

Every line count, speaker ranking, and scene tally on Mitarai Digital Folio comes from a single dialogue corpus drawn from the public-domain canonical text of Shakespeare’s plays.

Data Sources

Primary corpus

The complete dialogue corpus covering 37 canonical plays. Each line is attributed to a named speaker with act and scene numbers preserved from the First Folio and subsequent standard editions. Source: Folger Shakespeare Library.

Metadata (year, genre)

Compose years and canonical genre classification are merged from Open Source Shakespeare, using their WorkID-indexed metadata table. Where OSS gives a date range, we use the midpoint.

Period bucketing

Plays are bucketed into Early (pre-1595), Middle (1595–1603), and Late (1604–1613) based on the compose year merged above.

How We Count

Lines

We count each distinct dialogue row in the corpus as one line. Stage directions and scene headings are excluded.

Characters

Named speakers are indexed per play (so the Hamlet in Hamlet is counted separately from any other Hamlet mentioned elsewhere). A character is given a dedicated page once they speak at least 20 lines.

Scenes

Every unique Act+Scene pair in the corpus gets a dedicated page with its line count and speaker list.

Frequently Asked

Why might my edition show different counts?

Modern editions vary on line breaks and whether to count partial speeches. Our numbers are consistent within our corpus and reproducible.

Do you count apocrypha?

Only plays in the standard accepted canon are included. Apocryphal works (e.g. Edward III) are excluded unless added in a future update.

How do you handle Henry IV / Henry VI parts?

Each part is counted as its own play, matching the First Folio ordering.