Turn a Harry Potter Book into a Knowledge Graph

Company

Neo4j

Date Published

July 20, 2021

Author

Tomaž Bratanič

Word count

2065

Language

English

Hacker News points

None

URL

neo4j.com/blog/developer/turn-a-harry-potter-book-into-a-knowledge-graph

Summary

The text discusses the creation of a knowledge graph based on the Harry Potter book "Harry Potter and the Philosopher's Stone" using Neo4j, SpaCy, and Selenium. The author scraped the characters from the book's fandom page and preprocessed the text to remove co-references. They then used SpaCy's rule-based pattern matching feature to extract entities, prioritizing longer-word entities to overcome issues with single-word matches and character disambiguation. The extracted interactions between characters were stored in a Neo4j graph database, which was visualized to examine the results. The author concludes that the approach turned out well, but notes that fine-tuning might be needed for entity disambiguation on subsequent books.