Lenin Internet Archive: Quality


Question:

Is the V.I. Lenin on-line archive   (HTML = actual)   a character-by-character copy of all forty-five volumes of Lenin Collected Works   (books = expected) printed in Moscow during the 1960s and early 1970s?

Answers:


Work Flow for First-line Check

A database is created by typing the first line of every paragraph from every text from every volume of the Collected Works into a text file (the expected structure). Text file is read into a data dictionary by a Python script (see below).

Next, the HTML files referenced by lenin/works/cw/index.htm are read into a Python data dictionary (the actual structure).

Then the two data structures (expected and actual) are compared.

Finally, stuff that is missing or doesn't match gets reported.


Tools for First-Line Check


Volunteers


Summary


* * *