Since the method seems promising I decided to share it so that other people might get some use out of it (or suggest improvements). Parts of the image that were not text would then be whitespace, illustration or marginalia. (One way to do this in Mathematica is to use the ImageCooccurence function). As I was trying to figure out the details, however, I realized that a much simpler approach might work. My first thought was that I might be able to identify text based on its horizontal and vertical correlation. As an initial project, I decided to see if I could find a way to automatically extract images from the collection. Although I haven’t had much time yet to work with the sources, one of the things that I am interested in is using techniques from image processing and computer vision to supplement text mining. Adam suggested that we might do something with a collection of about 25,000 E-books. In September, Tim Hitchcock and I had a chance to meet with Adam Farquhar at the British Library to talk about potential collaborative research projects. Milligan, History in the Age of Abundance? (2019).Leskovec, Rajaraman & Ullman, Mining of Massive Datasets.Kanhabua, Nguyen & Niederée, “ What Triggers Human Remembering of Events”.Jurgens & Stevens, “ Event Detection in Blogs using Temporal Random Indexing“.Achlioptas, “ Database-Friendly Random Projections“.YAGO provides structured access to ~120M facts concerning ~10M entities, derived from Wikipedia, WordNet and GeoNames.provides access to years of free web crawl data. ![]() ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |