Skip to content

jeresig/matchengine-data-analysis

Repository files navigation

MatchEngine Data Analysis

This project is a collection of Shell and Cypher scripts for Neo4j that consumes MatchEngine image similarity data and generates a queryable graph for further analysis.

At the moment the code and scripts in this repository mostly exist to replicate the existing research and results that've been done against the Frick Photoarchive's Anonymous Italian photo archive and the Zeri Foundation's Italian art photo archive. More information about this research, and the results, can be found here:

Importing Data into Neo4j

To start you'll need to make sure that you have a copy of Neo4j installed on your computer. After you have it installed you'll need to start it. Make sure that it's running locally and is available on the default port.

Once you have done that you should be able to run the following command from your shell:

./import.sh

This will import all the existing data (seen in the data/ directory) into your personal copy of Neo4j. After this has been completed you can then open your browser and visit:

http://localhost:7474/

And you'll be able to query the imported data using Neo4j's Cypher query language.

Generating Data

Currently tools and scripts are provided for generating data from sources at the Frick Photoarchive and the Zeri Foundation. You will need to generate your own data, likely using your own tools, if you wish to analyze your own archive of images.

That being said this repository does contain all the data from the analysis done on the Frick and Zeri's Italian art collections and you can replicate those results by simply importing the data (as detailed above).

Artwork-Image Mapping

You'll need to have a last of image ID with their corresponding artwork IDs. The exact format for this data is detailed here.

In the case of the Frick and Zeri's collections specific tools were needed to convert the data from their existing formats into the preferred format linked to above. Those utilities can be found in the utils/ directory.

The final data resides in data/artwork-image-map.csv.

Known Mapping

Optionally you can provide a hand-curated list of mappings in-between artworks in different collections. This was done for the Frick Photoarchive's anonymous Italian art archive and the Zeri Foundation's 15th century Italian art archive. The hand-generated matches can be found in the data/known-map.csv file. This data can be used to confirm the quality of matches that were generated by MatchEngine, against those of a known expert.

The final data resides in data/known-map.csv.

Image Similarity Data

Finally, the image similarity data itself, as provided by MatchEngine. All image similarity data is generated by using the MatchEngine tools. The tool produces a JSON file which can then be converted into a usable CSV file. The script to do this can be found in shared/gen-similarity.js.

The final data resides in data/similarity.csv.

Credits

Created by John Resig. Released under an MIT license.

Funding for this project was provided by a Digital Resources grant from the Kress Foundation, in cooperation with the Frick Photoarchive.

About

Tools for analyzing data from TinEye's MatchEngine service.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages