Thx, Jimmy and Peter17 !
May be it is a good choice to have a look at its data dump. I will
give the feedback once there are some results.
Thank you again!
On Sun, Jun 13, 2010 at 3:21 PM, Jimmy Xu <xu.jimmy.wrk(a)gmail.com> wrote:
On Sun, Jun 13, 2010 at 5:55 AM, 杨杰
<xtyangjie(a)gmail.com> wrote:
1. what is the category (or categories) of a web
page (an article)?
eg. once I can get the two tips, the information is enough.
a. Web page P1 belongs to category C1;
b. Category C1 is under two parent categories CC1 and CC2, while
the two categories own their parent category chains seperately.
Then I can build a tree, which leaves are the web pages.
You can use API [1] function "prop=categories" to query any pages. Or you
could obtain a database dump [2] and query the `categorylinks` table.
1.
http://en.wikipedia.org/w/api.php
2.
http://dumps.wikimedia.org/backup-index.html
2. how do guys in wikipedia deal with the
category work upon the huge
amount of articles, for example, category method, level or inheritance
between categories.
They are stored in MySQL, see [3] and [4].
3.
http://www.mediawiki.org/wiki/Manual:Category_table
4.
http://www.mediawiki.org/wiki/Manual:Categorylinks_table
Department of Computer Science and Technology,
Xi’an Jiaotong University
I'm in Xi'an too :P
--
Jimmy Xu
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
--
Yang Jie(杨杰)
hi.baidu.com/thinkdifferent
Group of CLOUD, Xi'an Jiaotong University
Department of Computer Science and Technology, Xi’an Jiaotong University
PHONE: 86 1346888 3723
TEL: 86 29 82665263 EXT. 608
MSN: xtyangjie2004(a)yahoo.com.cn
once i didn't know software is not free, but found it days later; now
i realize that it's indeed free.