Recently I have been asked a few questions about how the tag graph was implemented. So I figured it might be good to have a public space to answer them!
Some preliminary questions:
- If 2 tags belong to same node , they have an edge between them?
- The different colors is for different types of node like questions , notes , research-notes , etc . ?
Please add any questions of your own and I'll hop back on to answer!
EDIT: I'll be posting more as soon as we can figure out where to place some files so they are publicly accessible.
<3<3<3
Reply to this comment...
Log in to comment
1) If 2 tags belong to same node , they have an edge between them ?
The tags don't belong to nodes, the nodes are actually the tags themselves. Each "tag node" as it were, has an edge between them when they occur on the same page on the plots website. I believe that goes for any page be it a research note or a wiki page. Take the following page (whoa meta) for example: Here you see the tags are:
The nodes are:
website design tags
The un-directed edges are:
website <-> design website <-> tags design <-> tags
2.) The different colors is for different types of node like questions , notes , research-notes , etc . ?
The colors relate to the "community" the nodes (tags) belong to. Take a look at this image @cfastie posted:
Is this a question? Click here to post it to the Questions page.
Reply to this comment...
Log in to comment
Follow up: @sagarpreet, in exploring export options for a different project, I just discovered that you can extract the visual attributes (i.e. color, size, positions) by exporting to one of the files that supports this (see matrix image here). Open the .gephi file, then go to File --> Export --> Graph File. When you choose a supported file format, the "Options" button should become "clickable". Click on it, and make sure to check off the boxes for any attributes you'd like.
Personally, since I do have an interest in web visualizations, but I do not have an interest in figuring out how to implement algorithms for things like community detection and calculating node sizes, I think this is a great way to quickly translate the static visualizations from gephi into something dynamic. The other plus is that I'd prefer to see changes I made update in real time without having re-run the program again.
Again, for all others, we'll get some files up real soon!
Reply to this comment...
Log in to comment
@bsugar added some really excellent research on finding associated tags that we can incorporate into the API planning. Just copying in here to keep as a reference as this moves forward. I'll also link into the long issue where this has been worked on: https://github.com/publiclab/plots2/issues/1502
Later adding:
Reply to this comment...
Log in to comment
@bsugar i had a quick question -- would it be OK to only collect /some/ of the related tags of each tag, i.e. limit the number of edges that we look up for each tag record? As of recently we have an optimized
Tag.related(tagname)
method that optimally returns the 5 tags most used with the given tag.In this code, I was able to collect the 5 most-related tags for each of the top 250 tags, and it runs in about 8-11 seconds on the production site:
https://gist.github.com/jywarren/07f598cca34bdc2f8042236b83f02b10
I'm wondering if we could just reformat that to be the correct JSON format and then hand it off to
tagoverflow
?Is this a question? Click here to post it to the Questions page.
Reply to this comment...
Log in to comment
Sorry @warren! Looks like this may have been addressed in the github conversation. However, for those that come after, given the goals which are probably satisfied by an approximation, I don't see why I wouldn't suffice.
I think the downside is the one that I mentioned in comment. The edge weights are created using something called the observed vs. expected odds ratio:
oe_ratio = (all_questions_count * tag_count_AB) / (tag_count_A * tag_count_B)
Pulling from the github comment
So, will five work? I think so. Yes. But I think what will technically happen is that you won't always know to keep "the eggs" (specific tag) in "stock" (on the graph), as it were, since you've presumed that you only want the top five associated "products" (tags).
Is this a question? Click here to post it to the Questions page.
Reply to this comment...
Log in to comment
Would it be possible to filter out powertags? Powertags are all the ones with the
:
in the middle. This would remove all lat:0 and long:0, which would be a big step forward. What do you think?Is this a question? Click here to post it to the Questions page.
Reply to this comment...
Log in to comment