Company
Date Published
Author
Tomaž Bratanič
Word count
2065
Language
English
Hacker News points
None

Summary

The text discusses the creation of a knowledge graph based on the Harry Potter book "Harry Potter and the Philosopher's Stone" using Neo4j, SpaCy, and Selenium. The author scraped the characters from the book's fandom page and preprocessed the text to remove co-references. They then used SpaCy's rule-based pattern matching feature to extract entities, prioritizing longer-word entities to overcome issues with single-word matches and character disambiguation. The extracted interactions between characters were stored in a Neo4j graph database, which was visualized to examine the results. The author concludes that the approach turned out well, but notes that fine-tuning might be needed for entity disambiguation on subsequent books.