A Short Tour Through Some of This Geographical Header
sf2.pdf
SUMLEV
The main guide for the "geographic header" data provided by this package is the US Census Bureau's sf2.pdf file, which is available from this page. In addition, there is some helpful information on the Census' bureau's view of geography, which includes a pointer to more useful articles.
Chapter 6 of sf2.pdf is the "Data Dictionary". It lists the columns,
and provides information on how to interpret each column. On the top
of page 6-2, it lists SUMLEV
, with an internal link to endnote 2.
Endnote 2, on page 6-15, contains some text, then contains an internal link to "How to Use This Product", Chapter 2 of sf2.pdf.
Chapter 2 has a section, on page 2-3, on "SUMMARY LEVEL SEQUENCE CHART", which includes, again, some text, and another internal link, this time to "The summary level sequence chart", which is to Chapter 4, "Summary Level Sequence Chart".
Pages 4-1 through 4-4 of Chapter 4 discuss "State Summary File 2". The data in this package is from the "National Summary File 2"; the description for this starts on page 4.5.
The National Summary File 2 appears to have two columns relevant to
"summary level sequence". The first is "Geographical component"
(GEOCOMP
) (wait! We haven't seen that one yet! patience…).
The second column is "Summary level", which turns out to be our old
friend, SUMLEV
. A SUMLEV
of 000 (is that a string? an integer?)
seems to have associated GEOCOMP
values of:
00, 89–95, A0–A2, C0–C2, C7–CT, E0–E2, E7–EJ, G0, H0
with a meaning of "United States". We tentatively conclude that this summary level relates to counts for the entire United States rather than, e.g., a value of 040, relating to (individual?) states.
NB: For reasons of space, this package only includes SUMLEV
values
010, 020, 030, 040, and 050.
GEOCOMP
If we go back to page 6-2 of Chapter 6 ("Data Dictionary"), the
variable ('column') is the most recently aforementioned GEOCOMP
,
"Geographical Component". And, this entry points at endnote 3.
Endnote 3, again on page 6-15, lists a number of GEOCOMP
values
(continuing on to page 6-18), and points again at Chapter 2, "How to
Use This Product", for further information.
CHARITER
Once again we go back to page 6-2, where we see "Characteristic
Iteration", CHARITER
, which points us at endnote 4, as well as at
Appendix H "for a full list of possible iterations" (implying maybe
not all are used? At least, not all the time?).
Endnote 4, on page 6-18, gives some text, and also points at Appendix H.
Appendix H, "Characteristic Iterations", starts off promisingly
This appendix lists the 331 possible iterations for Summary File 2.
and, reading on, we see that it encodes demographic ('racial') slices through the US population (and that a value of 001 indicates the entire population).
Code book
Cornell's CISER helpfully provides an Excel spreadsheet 1 describing the columns in the sf2 file beyond the geographical header.
If we included all (3GB zipped, 16GB unzipped!) of sf2 2 rather than just the geographical header, we would have occasion to refer to the Code Book .xls file. But, we haven't included all.
In closing
Good luck with it!