Appendix A: Explanation of Methods

Defining and measuring urban land

The following summarizes the main steps used to derive the built-up urban area data sets for each of the three regions. The process involves using image processing software to analyze satellite imagery (from Landsat Thematic Mapper 5) and to classify each pixel within the respective area under study as (a) built (urbanized areas), (b) unbuilt (greenlands, agricultural land, or rural open space), or (c) water. Further work was required to deal with areas classified in this process as urban that occur in largely rural areas and vice versa. The result of the process is the "urban land base" and urban boundary -- a line separating the built-up area from undeveloped land or greenfields. Figure A.1 outlines the processing steps.

Figure A.1 Processing steps in the image analysis

Image pre-processing

We used satellite images from the United States Geological Survey. The pre-processing step involved importing the data into the software that would be used for classification and then ensuring consistency across the data. Because satellite images vary according to the time of year and time of day they were taken, as well as by cloud cover, leaf cover, and so forth, inconsistencies among the images need to be reconciled, so that the classification step will produce consistent results each time the procedure is applied.

For ease in processing, each study area was divided into areas with similar land cover and terrain characteristics, and each of these areas was analyzed separately. For example, the Toronto region was broken down into three general areas: mostly urbanized; mostly rocks and wetlands; mostly farmland. The classification system distinguishes between hard and non-hard surface, and it is necessary to ensure clear distinctions between a hard surface that represents a rock outcropping and one that represents a building.

The images were further enhanced to prepare for the pixel-by-pixel analysis (a pixel represents roughly 30 x 30 m2 on the ground, or 900 m2). Since 30 m2 may contain a mixture of elements, these enhancements were designed to:

  • identify the most important element represented by each pixel (for example, a large building surrounded by lawns would be classified as "built" and a large farm field with a small building in it would be classified as "unbuilt");
  • identify textural patterns within and around each pixel that would facilitate classification (for example, a ploughed field would not appear green in the image, but would have to be classified as unbuilt);
  • categorize the "greenness" of a pixel in the unbuilt area (for example, an area of healthy vegetation in a rural area would be distinguished from the less-green vegetation of a downtown urban park);
  • consolidate all the kinds of land use and textures associated with agriculture (which includes many kinds of ground cover), and distinguish agricultural land from land that has been cleared for development.

Image classification

In this step, the computer analyses each pixel in the data set and applies one of the three categories: (a) built (hard surface), (b) unbuilt (permeable surface), or (c) water. This process requires the researcher essentially to "train" the software, by applying a classification algorithm to about 100 "training sites," and comparing the results to aerial photography to confirm whether the classification is accurate. The training sites are not individual pixels, but groups of pixels that represent a particular land cover or land use. Once the researcher is satisfied that the algorithm is producing reliable results, the algorithm can then be applied to the entire database.

A further step is required to verify the results for the area as a whole. This step is similar to the training process, in that the researcher compares the classification to a high-resolution aerial photograph, to confirm that the computer has correctly classified the image. Hundreds of sample sites were randomly chosen and analysed in this way, and the results showed that the algorithm was 100% accurate for water and 96?98% accurate for green areas. Identifying urban areas was also quite accurate (93?95%) in urban areas, but less so in rural areas, since many features of a rural area (buildings, roads, etc.) tend to be classified as urban. This problem was dealt with in the post-processing stage.


At this point, the data yields a map in three categories representing built areas, unbuilt areas, and water, but the results are not yet usable for defining an urban boundary. There are many pixels classified as "built" in the unbuilt (rural) area, and "unbuilt" pixels in the urban area. The post-processing stage is intended to clarify these anomalies and ensure that the urban boundary indicates the consolidated built-up area and excludes scattered built structures in the rural area.

First, the data is converted from raster (pixel-by-pixel data) format to vector format (polygons representing continuous urban or rural areas). Once this process is complete, the result is:

  • a rural area dotted with many individual polygons or clusters of polygons that are classified as "built" -- these are known as "noise";
  • an urban area filled with unbuilt "holes."

First, the "noise" was removed. The decision was made, after testing different areas, to remove all clusters of 500 or fewer pixels from the rural area (roughly 45 hectares). This mainly removed barren land represented by fallow fields or excavated areas. Additionally, this step removed things like hamlets at rural intersections, small highway commercial areas, and other minor areas of development outside urban areas. In a later step, some of these areas were added back if they occurred within two kilometres of a large, urban polygon (45 hectares).

Second, roads, which appear as urban areas, or "linear noise," were removed from rural areas. A database of roads was overlaid on the data and all the roads were reclassified from built to unbuilt. Of course, this removed all the roads from urban areas, so the researchers then had to reclassify the roads in the urban areas once this process was complete. A similar process was followed for highway ramps.

Figure A.2 illustrates the results of the road removal process. Figure A.2a is the original Landsat 5 TM image used in the analysis. Note the north-south roads in white. Figure A.2b shows the urban area data set before the roads have been removed. Note that only the major highways and arterials are successfully captured in the analysis. Figure A.2c shows the results after the roads have been removed. Note that only road segments that are surrounded by urban development have been retained in the final data set.

Figure A.2 llustration of the linear removal process

Third, the researchers did an extra layer of analysis outside large urban polygons (45 hectares). A two-kilometre ring was drawn around these areas and the polygons within that ring were scrutinized. Urban areas of more than 10 hectares in this zone were considered part of the contiguous urban fabric; everything else was classified as green.

Fourth, the researchers removed some (but not all) of the "holes" within the urbanized area. All holes of one hectare or less were reclassified as urban.

The final post-processing step included a set of rules that describes the relationship between the 1990 and 2001 data sets and addresses any conflicts between the two. If it was found that a pixel was classified as unbuilt in 2001 but built in 1990, then the imagery was checked for both years as a form of validation. Both data sets were modified to incorporate the validation results.

At the end of the image analysis procedure, two data sets were created. The first version was an urban data set that contained holes, representing non-urban land uses such as ravines, large municipal parks, golf courses, etc. We refer to this version as the "Swiss cheese" urban area. The second version fills in all holes to form a continuous urban area.

The "Swiss cheese" version is used in calculation where area matters, for example, to measure increases in urban land or density. The continuous version was used whenever the urban land base needed to be integrated with census geography, for example, in an overlay operation or during a selection process.

Understanding the distinction between the two data sets is important. Since the amount of vegetated land cover varies between regions, there is a potential for over-estimating a region's urban land base if the continuous urban area is used to calculate the increase in urban land. For example, Nose Hill Park in Calgary is a very large urban park (1,100 hectares), located at the urban edge in 1990, but by 2001, urban development completely encircled the park (see Figure A.3). If the continuous urban area data set was used to calculate the increase in urban land, it would have been over-estimated by over one thousand hectares. In Toronto, the extensive ravine system that runs through the urbanized area represents a large area of non-urban land use that may also have the potential to inflate the urban land base.

Figure A.3 The effect of Nose Hill Park in Calgary on calculating the urban land base in 1990 and 2001
"Swiss-cheese" urban land base Continuous urban land base

Analysing development patterns at the urban fringe

Our analysis focused on describing patterns of new development at the urban fringe. In this analysis, the urban fringe is defined as land within four kilometres of the edge of the 1990 urban land base. For all three regions, almost all greenfield development that occurred between 1991 and 2001 was in this area.1 Three landscape pattern metrics were used to analyze the composition of the urban fringe area and the configuration of new development (the increase in urbanized land as reflected in differences between the 2001 and 1990 land base) within it.

Urban land density

Urban land density is the ratio of newly urbanized land to all land potentially available for development.2 The change in urban land density across the urban fringe area represents variations in the amount of greenfield development occurring across this space. The urban fringe area was segmented into 16 bands that radiated from the edge of the 1990 urban land base; each band measured approximately 250 metres across. For each band, an urban land density value was calculated by dividing the amount of new urban land (the increase in urbanized land between 1990 and 2001) by the total amount of land available for development in each band. The extent of urbanized land was recorded for each band. The large number of bands enabled us to capture subtle changes in the composition of new development, which might not have been detectable with a smaller number of bands (Gar-on Yeh, 2001).

Percentage of urban patches and average patch size

The other two landscape pattern metrics were calculated based on an urban patch analysis, which describes new development as discrete urban areas, or urban patches, as they are often called in the urban ecology literature. For these metrics, we aggregated the 16 bands into four "superbands," each measuring one kilometre across. A smaller number of bands was chosen to minimize the severing of urban patches.

First, the number of urban patches per band was calculated. This illustrates how new development is distributed across the urban fringe. In order to compare this metric across all three regions and account for differences in size among the three regions, we expressed it as the percentage of urban patches in each band to all urban patches in the fringe area.

The last landscape metric measured average urban patch size (hectares) and compared the size of urban patches across the urban fringe.

Estimating the rate of intensification

Estimating the rate of intensification involved integrating our urban land base with fine-grained census data from Statistics Canada on the total number of dwellings and the number of new dwellings constructed between 1991 and 2001. Urban land cover data aligns particularly well with fine-grained census units such as blocks, because the boundaries of these census units are often defined by urban features such as roads.

Estimating rates of intensification requires the integration of three spatial data sets: (1) an urban land base, (2) census block units, and (3) Dissemination Areas (DA). The census variables (numbers) used to calculate the rate of intensification include total private occupied dwellings (aggregated to the block) and dwellings classified according to their period of construction (aggregated to the dissemination area). From the period-of-construction variable, we selected only those dwellings classified as having been built between 1991 and 2001.

Figure A.4 provides an overview of the process used to isolate the census units of interest and calculate an intensification rate. The details of the methodology can be found in Burchfield et al. (2007).

Figure A.4 Overview of the process used to estimate intensification rate

The next step was to map the location of intensification dwelling units. Although total dwelling counts are available for census blocks, the numbers of new dwellings (those built between 1991 and 2001) are available only for census dissemination areas (DAs), a larger spatial unit. The maps therefore indicate the net increase in the number of dwellings between 1991 and 2001 within each DA that lie mostly within 1990 built-up urban area. The number of dwelling units are located on the maps at the centroid (or geographic centre) of each dissemination area.

Why are we calling our measure of intensification an estimate?

Our method produces an estimate of historical intensification that is not without possible sources of error. In any data analysis, there are inexactitudes and sources of error, but we are confident that these are within an acceptable margin. The following likely have a minimal effect on the accuracy of the results:

  • Random rounding and suppression of census data
  • Use of the period-of-construction variable, which is based on a 20% sample and therefore does not represent exact numbers.
  • Delineation of the extent of the 1990 urbanized area
  • Classification of Census Blocks as "inside" or "outside" the 1990 urbanized area
  • Assumption that all dwellings in Census Blocks "outside" 1990 urbanized area were built after 1990

Of the possible sources of error, the one that may have the most impact on the results is the choice of threshold -- selecting blocks in which 50% of the area is inside the 1990 urban boundary (see Step 4a in Figure A.4). The choice of 50% as the threshold for inclusion introduces some uncertainty into the method. There is a risk of improperly classifying units, since some Census Blocks classed as "inside" will include some dwellings "outside," and vice versa. The assumption is that these misclassifications will largely cancel each other out.

The risk has also been mitigated by the use of Census Blocks, the smallest geographic units available. Because of the way Census Block boundaries are defined, the risk is small and again, within acceptable limits. See Figure A.5 for a comparison of the alignment between census geography and urban areas. The classification threshold represents a conservative selection of Census Blocks.

Figure A.5 Comparison of the alignment between census boundaries and the 1990 urban area data set in the lower-tier municipality of Markham, Ontario

a. 2001 Census Dissemination Areas (DA)
b. 2001 Census Blocks
c. DAs compared with the 1990 built-up urban area
d. Blocks compared with 1990 built-up urban area

The location of intensification

In Chapter 3, intensification is mapped according to the number of intensification dwelling units in each Dissemination Area. A graduated circle method is used to map the relative number of dwelling units in each DA. The circle is located in the centroid (approximate centre) of the DA. If a DA is large and irregularly-spread, the centroid may actually be outside the DA and the 1990 urbanized area. In these cases, we moved the centroid so that it was inside the DA and urbanized area and readers would not be confused when reading the maps.

Analysing the significance of spatial patterns

The spatial data exploratory software, GeoDa, was employed to calculate two spatial statistics from the intensification data: Moran's I (a global indicator of spatial autocorrelation) and LISA (a local indicator of spatial autocorrelation). Usually, these spatial statistics are applied to variables that are calculated as a rate rather than absolute numbers, because absolute numbers may be influenced by differences in the population of the underlying geographical units. But since Dissemination Areas are already standardized based on population, there is validity in calculating both statistics on the absolute number of intensification units within a DA.

A requirement of the LISA is the construction of a weights table, which describes the local neighbourhood in which the statistic is calculated. The weights table, sometimes referred to as a connections matrix, describes the interrelationship among geographic units -- in our case, DAs. For each Dissemination Area, the closeness of its neighbouring DAs can be based on a distance threshold or an adjacency threshold. After testing a number of scenarios, we chose to define a DA's neighbourhood based on an adjacency relationship -- that is, whether or not boundaries touched. A matrix was calculated based on first-order adjacency, so that if a DA immediately touched another DA in the vertical, horizontal and diagonal direction, it was considered a part of the neighbourhood for the DA of interest (first-order refers to immediate neighbours).

For our analysis, only DAs with a large number of intensification units that showed a high degree of spatial autocorrelation at the 95% confidence level were mapped, i.e. only statistically significant observations were mapped.

Examining submetropolitan patterns of urban growth

Two different census geographic units were used to divide the regions into three submetropolitan zones: Core Areas, Older Suburbs, and Newer Suburbs. In Toronto and Vancouver, Census Subdivisions were employed, but in Calgary, which is one large Census Subdivision, Census Dissemination Areas were used. The Core Areas category represents urban areas that were established before 1951, but in all three regions, it is possible for greenfield development to occur in these areas. For example, greenfield development can occur along the waterfront, either river or lakeside. In the Toronto region's Core Area, greenfield development opportunities still exist in the old City of Hamilton.

As discussed in Chapter 1 and 2, a rural component of the population exists in every metropolitan region. When statistics are aggregated to the level of the region, this component typically represents a very small percentage of the metropolitan population as a whole. However, we felt that the rural population needed to be removed from the submetropolitan zone analysis, since it could skew results in the Older Suburbs and Newer Suburbs.

Therefore, after dividing a region into three zones, a second analysis was performed. We used fine-grained census geography, Enumeration Areas and Dissemination Areas, to represent the urban population within each zone. After testing a number of scenarios, the following GIS operations were applied to the 1990/1991 and 2001 urban and census geography data sets:

  • Intersect census geography with continuous version of urban land base.
  • Dissolve on census geography ID, whereby urban polygons are grouped by a unique census unit ID.
  • Join tables of urban land base geography with census geography through unique census ID, whereby each census unit has all attributes from census geography plus the amount of urban area in each unit.
  • Create a new field in the census geography table that calculates the proportion of urbanized area in each census unit.
  • Select census geographical units that have a minimum of 40% urban area or 50 hectares of urban land. This selection includes census units in the large, core urban area.
  • Select census geographical units that have a minimum of 30 hectares and a minimum of 10% urban (both criteria must be met in this selection). This selection includes larger census units at the edge of the urban area, but excludes large, rural census geographical units.

Enumeration areas were geographical units used to collect statistics from the 1991 census, and dissemination areas were used to report statistics from the 2001 census. However, the boundaries of the two sets of fine-grained census geography are not consistent. Considering that the statistics were aggregated to the three zones, each of which covers a large area, the slight mismatch between these two fine-grained census geographies is expected to have a minimal impact on the overall trends and findings reported in Chapter 3.

In comparing the totals in population and dwellings aggregated by this method from the submetropolitan results to those derived from the region-wide results, we find that the rural population represents 5% or less of the total population in each region and 4% or less of all region-wide dwellings (see Table A.1).

Table A.1 Comparison of region-wide and submetropolitan values



A -- B

A -- B

Region-wide sum

Sub-metropolitan sum

Difference (1991)

















+ 5%


+ 3%

Dwelling units






+ 5%


+ 3%








+ 5%


+ 4%

Dwelling units






+ 4%


+ 4%








+ 3%


+ 1%

Dwelling units






+ 3%


+ 1%

To better understand this "missing" population, Figure A.6 illustrates the rural hinterland in Hamilton, an area of the Toronto region, using different types of imagery and census maps. Figure A.6a shows this area as depicted in the satellite image used in the analysis. Rural roads are seen in white and along these roads are small white dots. Figure A.6b shows the same area with census blocks overlaid onto the satellite image. Small census blocks are an indication of areas with higher population, and large census blocks are areas with a lower population. In both figures, a black box outlines the area shown in Figure A.6c. This figure is an air photo provided by Google Maps. The air photo image includes more detail than our satellite image, indicating that those white dots are houses along the rural road and concentrations of a few houses, perhaps representing smaller lots that were subdivided from a larger rural lot. This area is a good example of large-lot low-density settlements in the rural area of a large metropolitan region.

As shown in Figure A.6b, the census geography covers a very large area as it tries to capture population at a minimum threshold to ensure anonymity and privacy, so that individuals or individual households cannot be identified. In this example two principles are illustrated: (1) the imagery we are using in our analysis is not at a fine enough resolution, or detail, to capture isolated and sparse development in rural areas and (2) census geography is not adequate to capture small areas of population without aggregating population to large census units.

After the selection of census geography was performed, the "Swiss-cheese" version of the urban land base corresponding to this geography was extracted, and its area summed for each zone. This was used to calculate urban densities for the submetropolitan numbers in Chapter 3.

Figure A.6 An example of rural settlements in Hamilton, Ontario, as illustrated by imagery and census geography

1. In Toronto, more than 97% of greenfield development occurred within four kilometres of the 1990 urban land base. In Vancouver and Calgary, the figures were approximately 99%.
2. To calculate the land potentially available for development, we removed constraints to development (such as protected areas, First Nations reserves, water bodies, and urbanized land developed before 1990) from the developable land area. In Vancouver, 2001 urban areas that overlapped with the Green Zone were counted as "developable." The Green Zone has many small holes that coincide with roads, which are not legally part of the Green Zone. Most of the overlap occurred along roads within the Green Zone.