Log In · Register

 
Wiki Race
mipadi
post Feb 22 2009, 02:48 PM
Post #1


Senior Member
******

Group: Administrator
Posts: 2,648
Joined: Apr 2008
Member No: 639,265



So you've probably heard of wiki race, a game in which you try to find a path from one Wikipedia article to another, using the Wiki links in the article. My roommate and I wrote programs to automatically find such paths and "win" the race. His is written in Ruby, and mine in Python. Here's mine:

CODE
#!/usr/bin/env python

from BeautifulSoup import BeautifulSoup
import os
import re
import sys
import urllib2


# Note: In order to keep a more functional style, as well as
# simplify programming, a node in a tree is simply a 3-tuple
# in the form (node, parent, [children]).


_wikiurl = "http://en.wikipedia.org/wiki/"

# We must pretend to be Google (or another browser, but I
# picked Googlebot); otherwise, Wikipedia gives us a
# 403 Forbidden error.
_opener = urllib2.build_opener()
_opener.addheaders = [("User-agent", "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)")]

_wiki_href = re.compile(r"^\/wiki\/.*$")
_wiki_article = re.compile(r"^\/wiki\/([A-Z][a-z0-9%_\(\)-]*)$")


def links(article):
    """Returns a list of all the articles linked from a particular article."""
    page = _opener.open(_wikiurl + article)
    soup = BeautifulSoup(page)
    wikis = []
    links = soup.findAll(name="a", attrs={'href': _wiki_href})
    for link in links:
        matched = _wiki_article.match(link['href'])
        if matched:
            wikis.append(matched.group(1))
    return wikis


def race(start, target):
    """Begins the wiki race with the given starting article and target."""
    cur = _race(target, (start, None, links(start)), set())
    path = [cur[0]]
    while cur[1] is not None:
        cur = cur[1]
        path.append(cur[0])
    path.reverse()
    return path


def _race(target, tree, checked):
    print "Checking " + tree[0]
    checked.add(tree[0])
    if tree[0] == target:
        print >>sys.stderr, "  Found target!"
        return tree
    else:
        if len(tree[2]) > 0:
            print >>sys.stderr, "  Descending into child links"
            for child in tree[2]:
                if child not in checked:
                    return _race(target, (child, tree, links(child)), checked)
        else:
            print >>sys.stderr, "  Child links exhausted"
            parent = tree[1]
            parent[2].remove(tree[0])
            return _race(target, parent, checked)


def main():
    usage = "Usage: %s Wiki_Start Wiki_End" % os.path.basename(sys.argv[0])
    
    if len(sys.argv[1:]) < 2:
        print >>sys.stderr, usage
        return 1
    
    start = sys.argv[1]
    goal = sys.argv[2]
    print "Finding a path from '%s' to '%s'" % (start, goal)
    path = race(start, goal)
    print "Here is the path:"
    for p in path:
        print "  " + p
    
    return 0


if __name__ == "__main__":
    sys.exit(main())


You can run it if you want, but it takes a long time.

If anyone has any other solutions or tweaks for the program, please post. _smile.gif
 

Posts in this topic
mipadi   Wiki Race   Feb 22 2009, 02:48 PM
ArjunaCapulong   wat   Feb 22 2009, 04:57 PM
Joanne   Haha we used to play this game at school when we g...   Feb 22 2009, 04:59 PM


Reply to this topicStart new topic
2 User(s) are reading this topic (2 Guests and 0 Anonymous Users)
0 Members: