Wikipedia Dump Reader

Education Apps

Source (link to git-repo or to original if based on someone elses unmodified work): Add the source-code for this project on opencode.net

2
5 .0
Description:

This simple programs display the text-only wikipedia compressed dumps, currently available at http://download.wikimedia.org/backup-index.html, generally named something like pages-articles.xml.bz2.

It's fairly useable now although lots of rendering issues occurs

Features includes a Qt viewer with basic text markup, following links, ability to read directly on the .bz2 compressed file (altough some index creations step is needed on first run), tab-like list of articles with load-in-the-background by default, a simple but useful keyword search, very light source-code, optional latex rendering.

The code requires PyQt4

Older versions has been tested on Fedora Core 4 and Kubuntu with PyQt4.1 (Python 2.4, Qt 4.2), and Ubuntu Gutsy.

See included README

Note that the development tree is now hosted on launchpad. See https://launchpad.net/wikipediadumpreader/

Any comment is welcome.
Last changelog:

11 years ago

Updated to 0.2.10:
- Use a new indexing scheme for the entrylist - articles load faster now
- Upgrade path for old indexing scheme
- Utf8 fixes for non-ascii pathnames
- experimental RPM package - feedback welcome at the project website : https://launchpad.net/wikipediadumpreader

(jul 09: updated the ubuntu package for Jaunty's Python2.6 compatibility)

Updated to 0.2.9:
- make it able to load Wiktionary non-uppercased words
- Ability to load a 64-bits module - Thanks to Michael Heide
- added a small UI layout - Thanks to GreenReaper
- Better corrupted files handling

Updated to 0.2.8:
- Sorry : no program changes, but a much more friendly opening dialog
Built a rough Ubuntu package, to ease installation for unexperienced users running Ubuntu Gutsy or Hardy


Updated to 0.2.7:
- minor rendering fixes
- a few more macros

Updated to 0.2.6:
- better wikisyntax parsing
- minor bugfixes

Updated to 0.2.5:
- Bugfixes and improvement in rendering.
- Moved the development tree to lp
- optional fontsize

Updated to 0.2.4:
- Optional Latex/texvc call to render math. thanks to Mathieu Beliveau

Updated to 0.2.3:
- Fixed an obvious overflow bug in the index creation code.
Rebuilding the index is necessary, sorry. To force it, delete the two *idx files before running the program, and be patient (English dumps index creation takes several dozen minutes)
- basic table and footnotes support

Updated to 0.2.1 : fix a bug when reading articles on blocks boundaries
Updated to 0.2.2 : improved wiki rendering for lists and definitions

REMF

13 years ago

do you have a roadmap of where you want to go with this great app?

Report

C

benji2

13 years ago

Hi, thanks for your support.
My immediate goal is to fix one obvious overflow bug on the version 0.2.2 index creation code. Adding basic Table support should follow soon.
After that, i don't have much plans yet. I may work on improving wiki rendering or getting faster/smarter indexing. Also implementing the above mentionned suggestions about the category management sounds interesting.
Of course, i glady accept suggestions/feedback, both on the features or the UI

Report

REMF

13 years ago

thanks and good luck, i look forward to trying it soon.

Report

REMF

13 years ago

i have been wanting an offline reader for ages, and finally it comes along.

thanks you.

Report

REMF

13 years ago

i would love to see this released as an easy to use wikipedia offline .xml reader.

keep up the good work.

Report

Superstoned

13 years ago

Pretty neath. BTW do you happen to know what happened to the integration library for wikipedia in KDE? There was supposed to be a library to make it easier to have wikipedia info in KDE apps, like currently in amarok and Marble and such... Maybe it's something for you ;-)

Report

Ekardnam

13 years ago

You (as in plural; I think superstoned already know about this ;) can read more about the Wikipedia and KDE cooperation here: http://meta.wikimedia.org/wiki/KDE_and_Wikipedia

It is indeed very nice, however, I don't know how it's proceeding either. :/

Report

jayenell

13 years ago

Looks very promising and also very clean. There are a few things I think should be fixed first.

1. Remove links to non existing articles (the red links)
2. Remove the interwiki links.
3. Remove links to categories or make sure the categories are parsed correctly.
4. Make sure when you click on a link you go directly to that article. Right now it is only added to the list on the right.
5. Make sure it depends on Python 2.5. 2.5 is default on Kubuntu and I had to install 2.4 also.

Keep up the good work,

J

Report

C

benji2

13 years ago

Hi !
Thanks for the comment
Those are interesting suggestion, i will have a look at it.

About 4., This is the intended behaviour, similar to "load tab in the background" in webbrowsers, which is a popular feature. You can get a direct-go behaviour by changing "self.loadTabInBackground = True" to False in the begining of dumpReader.py
Maybe i'll add an option for that in the GUI if needed.

About 5. The updated program should works on both Python 2.4 and 2.5. If not, please tell me the error you get. Some python module is included, maybe compiling it (as opposed to using the included precompiled .so) can help.

Report

11 years ago

Updated to 0.2.10:
- Use a new indexing scheme for the entrylist - articles load faster now
- Upgrade path for old indexing scheme
- Utf8 fixes for non-ascii pathnames
- experimental RPM package - feedback welcome at the project website : https://launchpad.net/wikipediadumpreader

(jul 09: updated the ubuntu package for Jaunty's Python2.6 compatibility)

Updated to 0.2.9:
- make it able to load Wiktionary non-uppercased words
- Ability to load a 64-bits module - Thanks to Michael Heide
- added a small UI layout - Thanks to GreenReaper
- Better corrupted files handling

Updated to 0.2.8:
- Sorry : no program changes, but a much more friendly opening dialog
Built a rough Ubuntu package, to ease installation for unexperienced users running Ubuntu Gutsy or Hardy


Updated to 0.2.7:
- minor rendering fixes
- a few more macros

Updated to 0.2.6:
- better wikisyntax parsing
- minor bugfixes

Updated to 0.2.5:
- Bugfixes and improvement in rendering.
- Moved the development tree to lp
- optional fontsize

Updated to 0.2.4:
- Optional Latex/texvc call to render math. thanks to Mathieu Beliveau

Updated to 0.2.3:
- Fixed an obvious overflow bug in the index creation code.
Rebuilding the index is necessary, sorry. To force it, delete the two *idx files before running the program, and be patient (English dumps index creation takes several dozen minutes)
- basic table and footnotes support

Updated to 0.2.1 : fix a bug when reading articles on blocks boundaries
Updated to 0.2.2 : improved wiki rendering for lists and definitions

12345678910
Be the first to comment
DaiVied
Dec 30 2009
REMF
Aug 02 2009
File (click to download) Version Description PackagetypeArchitectureRelease Channel Downloads Date Filesize DL OCS-Install MD5SUM
*Needs pling-store or ocs-url to install things
Pling
0 Affiliates
Details
license
version
0.2.10
updated Aug 16 2009
added Aug 29 2007
downloads 24h
0
mediaviews 24h 0
pageviews 24h 1
System Tags app software