Home » post » Emulation in action, retrieving Malawi census data from 1970s

Our mission

Free Government Information (FGI) is a place for initiating dialogue and building consensus among the various players (libraries, government agencies, non-profit organizations, researchers, journalists, etc.) who have a stake in the preservation of and perpetual free access to government information. FGI promotes free government information through collaboration, education, advocacy and research.

Emulation in action, retrieving Malawi census data from 1970s

Emulation is one proposed method of preserving digital information. Below is a case study of using emulation to retrieve 1977 Census data from the African nation of Malawi. FGI obtained the permission of Brian Spoor to post this message here.

Emulation may not be practical in all circumstances, but it’s nice to see a real world example. This example also shows how time and resource intensive digital preservation can be.

>>—–Original Message—–
>>From: Digital-Preservation Announcement and Information List
>>[DIGITAL-PRESERVATION@JISCMAIL.AC.UK]On Behalf Of Kevin Ashley
>>Sent: Thursday, April 21, 2005 2:05 AM
>>To: DIGITAL-PRESERVATION@JISCMAIL.AC.UK
>>Subject: A practical use for old system emulators
>>
>>
>>I copy below in its entirety a post by Brian Spoor to the newsgroup
>>alt.folklore.computers which demonstrates a very practical use for
>>old system emulators in rescuing digital data. Brian used the ICL
>>1900 emulator written by Dave Holdsworth and Delwyn Holroyd to
>>help in reading tapes from the 1977 Malawi census.
>>
>>As Brian says in his message, it would have been possible to rescue this
>>data without the existence of the emulator. But for someone like himself
>>who was familiar with the original systems, the emulator provided
>>the most practical means to recover the information, allowing him to use
>>the same programming languages and tools to recover the data that he
>>would have used at the time the data was created. Nonetheless, I
>>think the decoding job he did on this information without the
>>supporting documentation is impressive.
>>
>>
>>>Date: Fri, 15 Apr 2005 23:06:22 +0100
>>>From: Brian W Spoor
>>>Newsgroups: alt.folklore.computers
>>>Subject: A practical use for old system emulators
>>>Message-ID:
>>>
>>>I responded to a request for help on data formats used on ICL 1900
>>>systems by the Population Studies Centre, University of Pennsylvania for
>>>the African Census Analysis Project (see http://www.acap.upenn.edu).
>>>
>>>The problem was that they had a set of data (14 reels of 1/2″ magnetic
>>>tape), from the 1977 Malawi Census, that did not apparently match the
>>>census data expected and appeared to have binary or some form of
>>>compressed data embedded in it. I took a look at a sample of the data
>>>supplied and wrote some programs to unscramble it.
>>>
>>>The first step was to examine the sample files (using a 1900 tape print
>>>utility on a PC) to evaluate what they consisted of, and to ensure that
>>>they did resemble 1900 data. The files supplied, in ‘bitstream’ format,
>>>contained:-
>>>
>>>a) a standard 1900 series tape header label b) a standard 1900 series
>>>start of subfile followed by variable length data records c) a standard
>>>1900 series end of subfile
>>>
>>>I then re-assembled the files into a ‘tape’ that could be read on the
>>>George 3 system and wrote some PLAN programs under George 3 to look at
>>>the data. The data consisted of variable length records in the range 3 –
>>>19 words in length and I was lucky that there was one 19 word record,
>>>which corresponded with the original Census Form.
>>>
>>>By extracting the records by record length, starting with the longest, I
>>>was able to make some sense of them and the ‘funny compressed data’ at
>>>the end of the records. The data at the end of the record consisted of
>>>codes that described how the record had been compressed, with the last
>>>word of the record containing a count of the number of compressions for
>>>that record.
>>>
>>>The funny data was counter/modifier words that described how many words
>>>were duplicate with the preceding record and they where in the record.
>>>The compression turned out to be extremely simple, just drop duplicate
>>>words from the record, and not specifically related to the data, but it
>>>did reduce the data volume to about 2/3rds of the uncompressed volume.
>>>Various programs were written, and modified as a better understanding of
>>>the data evolved.
>>>
>>>In addition to just de-compressing the records into card images that
>>>could be output and then transferred to a *nix system for future use, I
>>>also wrote programs to recreate a Census Tape and to process it and
>>>produce a Census Report.
>>>
>>>Having proved the programs on the sample data, I was asked to complete
>>>the processing of the full set of data tapes. A Perl program was needed
>>>to convert the bitstream files that had been created from the original
>>>tapes to the format required by the G3 system.
>>>
>>>As a side note, the Perl programs had to be run under Unix, when run
>>>under Windows they appeared to work, but the file sizes were larger than
>>>the amount of data converted and random rubbish was interspersed within
>>>the valid data, which wasted an afternoon looking for a program error
>>>that wasn’t there.
>>>
>>>After conversion, I found that there were 3 separate multi-reel files,
>>>with all of the reels available. Running the decompression program on
>>>the first set of tapes and then running a census report, resulted in
>>>roughly a 50% population increase.
>>>
>>>Decompressing all the tape sets produced 96 sets of data, when we were
>>>looking for 24 sets of data (1 per district). Manual analysis of the
>>>extracted files showed that we had partial and duplicate sets of data
>>>for the districts. In most cases, a single file was an exact match on
>>>numbers for a district from published data. In a couple of cases, a
>>>group of partial files had to be used.
>>>
>>>Once the required files had been identified, I was able to create a new
>>>Census Tape and run a Census report from it. The numbers matched, apart
>>>from one district where 520 people had been lost, from a total
>>>population of 5.5 million.
>>>
>>>There was no requirement to create a new tape, the relevant files were
>>>transferred to a *nix system for future use, but in my view it rounded
>>>off the work, leaving a new complete ‘tape’ for posterity.
>>>
>>>For more information on the programs, see
>>>http://www.fcs.eu.com/fcssys/mal77.html
>>>
>>>Was it necessary to use George 3? No, all of the work could have been
>>>done in Perl, but it was more fun, and to my way of thinking easier, to
>>>work in a 1900 environment, after all I did this work for enjoyment, not
>>>for financial reward.
>>
>>–
>>Kevin Ashley
>>Head of Digital Archives Department
>>ULCC http://www.ulcc.ac.uk/

CC BY-NC-SA 4.0 This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.


Leave a comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Archives