Project

General

Profile

Actions

Bug #7

closed

Optimize wikipedia history fetching

Added by Pedro Pires 7 months ago. Updated 5 months ago.

Status:
Closed
Priority:
High
Assignee:
Target version:
Start date:
10/09/2025
Due date:
% Done:

70%

Estimated time:
Spent time:

Description

Currently, Wikipedia pageview history has to be requested every day for each stock. This kindof limits the amount of stocks there can be. I think there is a possibility to use page view dumps from wikipedia to avoid constantly battering their servers. However, it might overload our host, the files once un-bz2-d can reach upwards of 3GB or more and our host only has around 3GB or ram. Theoretically, it would be possible but with PHP being an ass language, I might have to do some fuckeries with cursors and shit to extract the data I need and discard the rest. Maybe extract the file, only keep data for en.wikipedia reducing some of the file size and then keep only the lines that have the articles I am using further decreasing the file size.

documentation of the data format

Actions #1

Updated by Pedro Pires 7 months ago

  • Target version set to v0.4.0
Actions #2

Updated by Pedro Pires 7 months ago

  • Status changed from New to In Progress
Actions #3

Updated by Pedro Pires 7 months ago

  • % Done changed from 0 to 70
Actions #4

Updated by Pedro Pires 6 months ago

  • Status changed from In Progress to Feedback
Actions #5

Updated by Pedro Pires 5 months ago

  • Status changed from Feedback to Closed
Actions

Also available in: Atom PDF