Section | |
---|---|
|
Prepare the Input Data
The Bulk Loader can load data into the EAS from either a shapefile or comma-separated value (CSV) file, and is run on the ETL server as a job.py
job. Currently, the ETL server is running the Windows operating system.
Shapefile
Ensure that your shapefile has a defined projection of NAD 1983 StatePlane California III FIPS 0403 Feet (EPSG 102643) before attempting to load it into EAS
- Robert Gaston Is there a preferred defined projection? Is it necessary to have a defined projection in the first place?
.
Shapefiles must be named bulkload.shp
and placed in the following directory on the ETL server:
...
field name | data type | length |
---|---|---|
OBJECTID | integer | 9 |
block | string | 80 |
lot | string | 80 |
unit | string | 80 |
unit_sfx | string | 80 |
st_num | integer | 4 |
st_num_sfx | string | 20 |
st_name | string | 32 |
st_suffix* | string | 10 |
st_type | string | 4 |
source | string | 32 |
st_prefix | string | 10 |
Examples
address | block | lot | unit | unit_sfx | st_num | st_num_sfx | st_name | st_suffix* | st_type | source | st_prefix |
---|---|---|---|---|---|---|---|---|---|---|---|
1 SOUTH VAN NESS AVE | 3506 | 001 | 1 | VAN NESS | AVE | SOUTH | |||||
20 25TH AVE NORTH | 1301 | 018 | 20 | 25TH | NORTH | AVE | |||||
127 A CENTRAL AVE | 1233 | 007 | 127 | A | CENTRAL | AVE | |||||
5900 03RD ST 2217 | 5431B | 093 | 2217 | 5900 | 03RD | ST | |||||
100 BROADWAY | 0141 | 003B | 100 | BROADWAY | |||||||
200 MAIN ST (APARTMENT) 2D | 3739 | 008 | 2 | D | 200 | MAIN | ST |
Comma-separated Value (CSV) file
CSV files must be named bulkload_data.csv
and placed in the following directory on the ETL server:
...
field name | data type | length |
---|---|---|
block | string | 256 |
lot | string | 256 |
unit | string | 256 |
unit_sfx | string | 256 |
street_number | string | 256 |
street_number_sfx | string | 256 |
street_name | string | 256 |
street_sfx* | string | 256 |
street_type | string | 256 |
load_source | string | 32 |
Examples
address | block | lot | unit | unit_sfx | street_number | street_number_sfx | street_name | street_sfx* | street_type | load_source |
---|---|---|---|---|---|---|---|---|---|---|
1 SOUTH VAN NESS AVE | 3506 | 001 | 1 | SOUTH VAN NESS | AVE | |||||
20 25TH AVE NORTH | 1301 | 018 | 20 | 25TH | NORTH | AVE | ||||
127 A CENTRAL AVE | 1233 | 007 | 127 | A | CENTRAL | AVE | ||||
5900 03RD ST 2217 | 5431B | 093 | 2217 | 5900 | 03RD | ST | ||||
100 BROADWAY | 0141 | 003B | 100 | BROADWAY | ||||||
200 MAIN ST (APARTMENT) 2D | 3739 | 008 | 2 | D | 200 | MAIN | ST |
...
* Use this field for streets that have a Street Name Post Directional, for example the "North" in "25th Ave North".
Prepare the Output Table
The Bulk Loader will produce an output table (bulkloader.address_extract
) that contains, among other information, the disposition of each input address record that was submitted to the Bulk Loader. The output table may already have records in it for any of the following reasons:
...
- Establish your SSH tunnel to the target EAS database server.
- Invoke pgAdmin and navigate to the target database, and then the output table (
bulkloader.address_extract
). - Right-click the output table and invoke the Query Tool.
Verify that the output table is not emptyRemove leftover records from the previous Bulk Loader iteration.
Code Block language sql SELECT COUNT(*) AS record_count FROM TRUNCATE bulkloader.address_extract, bulkloader.blocks_nearest;
Run garbage collection to free disk space held by truncated tuples.
If the output table is not empty (Code Block language sql VACUUM FULL ANALYZE bulkloader.address_extract;
record_count > 0
), then truncate the output table and vacuum the databaseVACUUM FULL ANALYZE bulkloader.blocks_nearest;
Optional: count all records in the database for use in a before/after record count comparison.
Code Block language sql TRUNCATE bulkloader.address_extract; VACUUM FULL ANALYZE;
Verify that the output table is now empty (
record_count = 0
).Code Block language sql SELECT COUNT(*) AS record_count FROM bulkloader.address_extract;
- After your next bulk-load attempt, the output table will only contain results from that bulk-load attempt.
- Repeat these steps as needed.
Run the Bulk Loader
...
SELECT schemaname,relname,n_live_tup FROM pg_stat_user_tables ORDER BY schemaname,relname,n_live_tup;
Run the Bulk Loader
The Bulk Loader is run on the ETL server, and depending on the number of address records that are going to be processed, the Bulk Loader will complete its work at any time from a few minutes, to a few hours. The following table contains some information for some typical runs.
...
- Remote desktop (RDP) into the ETL server.
Move to the directory that contains the file
job.py
.Code Block language text cd C:\apps\eas_automation\automation\src
Stage the input data.
Code Block language text title Shapefile Input Format python job.py --job stage_bulkload_shapefile --env [Your Environment] -<environment> --action EXECUTE
Code Block language text title CSV Input Format python job.py --job stage_bulkload_csv --env [Your<environment> Environment] --action EXECUTE
Run the
bulkload
job.Code Block language text python job.py --job bulkload --env [Your<environment> Environment] --action EXECUTE
Evaluate the Results
...
Anchor | ||||
---|---|---|---|---|
|
Evaluate the results of the bulk-load by opening the database and inspecting the output table bulkloader.address_extract
for values in the field exception_text
Addresses that were successfully bulk-loaded will have a date in the field
load_tms
and a NULL value in the fieldexception_text
.Code Block language sql -- Count records successfully bulk-loaded: SELECT COUNT(*) FROM bulkloader.address_extract WHERE NOT(load_tms IS NULL) AND (exception_text IS NULL); -- View records successfully bulk-loaded: SELECT * FROM bulkloader.address_extract WHERE NOT(load_tms IS NULL) AND (exception_text IS NULL);
The presence of exception text indicates a problem with the source data.
To rectify bulk-load exceptions, do not correct any exception records in the output tableCode Block language sql -- Count records with exception: SELECT COUNT(*) FROM bulkloader.address_extract WHERE NOT(exception_text IS NULL); -- View records with exception: SELECT * FROM bulkloader.address_extract WHERE NOT(exception_text IS NULL);
bulkloader.address_extract
. Instead, correct the-- View totals for each distinct 'exception_text' value SELECT exception_text, Count(*) FROM bulkloader.address_extract GROUP BY exception_text ORDER BY exception_text;
- To rectify bulk-load exceptions, do not correct any exception records in the output table
bulkloader.address_extract
. Instead, correct the corresponding records in the source data and then re-run the Bulk Loader. If block and lot values were included in the input data, a value should be found in the field
address_x_parcel_id
. Run the query below to find any records that had block and lot input values but did not get assigned a value foraddress_x_parcel_id.
Code Block language sql -- Count records where block and lot were provided but no address_x_parcel was assigned: SELECT COUNT(*) FROM bulkloader.address_extract WHERE (NOT (block IS NULL) OR NOT (block IS NULL)) AND address_x_parcel_id IS NULL;
Base addresses that were submitted by the Bulk Loader but were already in the EAS are indicated by values in the fields
address_base_id
andload_tms
, and NULL values in the fieldsstreet_segment
,zone_id
orexception_text
not get assigned a value foraddress_x_parcel_id.
Code Block language sql -- Count records where block and lot were provided but no address_x_parcel was assigned: SELECT COUNT(*) FROM bulkloader.address_extract WHERE (NOT (block IS NULL) OR NOT (block IS NULL)) AND address_x_parcel_id IS NULL;
Base addresses that were submitted by the Bulk Loader but were already in the EAS are indicated by values in the fields
address_base_id
andload_tms
, and NULL values in the fieldsstd_unit, street_segment,
zone_id, address_id
andexception_text
.Code Block language sql -- Count base addresses that were already in the EAS: SELECT COUNT(*) FROM bulkloader.address_extract WHERE NOT(address_base_id IS NULL) AND NOT(load_tms IS NULL) AND (std_unit IS NULL) AND (street_segment_id IS NULL) AND (zone_id IS NULL) AND (address_id IS NULL) AND (exception_text IS NULL); -- View base addresses that were already in the EAS: SELECT * FROM bulkloader.address_extract WHERE NOT(address_base_id IS NULL) AND NOT(load_tms IS NULL) AND (std_unit IS NULL) AND (street_segment_id IS NULL) AND (zone_id IS NULL) AND (address_id IS NULL) AND (exception_text IS NULL);
Count and view new base addresses by looking for non-NULL values in the
street_segment_id
field. Cross-check with the 'public' schema by counting new 'public.address_base' records as shown below in Evaluate 'public' schema.Code Block language sql -- Count base addresses that were not already in the EAS: SELECT COUNT(*) FROM bulkloader.address_extract WHERE NOT(address_base_id IS NULL) AND NOT(load_tms IS NULL) AND (street_segment_id IS NULL); -- View base addresses that were AND (zone_id IS NULL) AND (exception_textnot already in the EAS: SELECT * FROM bulkloader.address_extract WHERE NOT (street_segment_id IS NULL);
Evaluate records with non-NULL
address_id
values to count/view unit addresses that were added to the EAS.
Base addresses that were submitted by the Bulk Loader and were not already in the EAS are indicated by values in theCode Block language sql -- ViewCount baseunit addresses thatadded wereto already in the EAS: SELECT COUNT(*) FROM bulkloader.address_extract WHERE NOT(address_base_id IS NULL) AND NOT(load_tms IS NULL) AND (street_segment (address_id IS NULL); -- View unit addresses added AND (zone_id IS NULL) AND (exception_text IS NULL);
street_segment_id
fieldto the EAS: SELECT * FROM bulkloader.address_extract WHERE NOT (address_id IS NULL);
Anchor | ||||
---|---|---|---|---|
|
Evaluate tables in the public
schema to cross-check record counts from the bulkloader
schema evaluation.
Find the
change_request_id
of the most recent bulk load.
Evaluate records with non-NULLCode Block language sql -- CountGet base addresses that were not already in the EAS: SELECT COUNT(*) FROM bulkloader.address_extract WHERE NOT (street_segment_id IS NULL)the change_request_id of the most recent bulk load: SELECT change_request_id FROM public.change_requests WHERE requestor_comment LIKE 'bulk load change request' ORDER BY change_request_id DESC LIMIT 1; -- ViewMake basenote addressesof thatthis were not already in the EAS: SELECT * FROM bulkloader.address_extract WHERE NOT (street_segment_id IS NULL)
address_id
values to count/view addresses that were added to the EASid for the queries in the subsequent steps.
Count and view new base addresses associated with the change request.
Code Block language sql -- Count addressesnew addedbase to the EAS: addresses SELECT COUNT (*) FROM bulkloaderpublic.address_extractaddresses WHERE NOT (address_id activate_change_request_id = change_request_id_goes_here AND unit_num IS NULL); -- View new base addresses addedSELECT to* the EAS: SELECT * FROM bulkloader.address_extract WHERE NOT (address_idFROM public.addresses WHERE activate_change_request_id = change_request_id_goes_here AND unit_num IS NULL)
Schema 'public'
Evaluate tables in the public
schema to cross-check record counts from the bulkloader
schema evaluation.
- Find the
change_request_id
of the most recent bulk load;
Count and view new unit addresses associated with the change request.
Code Block language sql --- GetCount the change_request_id of the most recent bulk load: SELECT change_request_idnew unit addresses SELECT COUNT (*) FROM public.change_requests WHERE requestor_comment LIKE 'bulk load change request' ORDER BY addresses WHERE activate_change_request_id = change_request_id DESC LIMIT 1_goes_here AND NOT unit_num IS NULL; -- Make note of this id for the queries in the subsequent steps. View new unit addresses SELECT * FROM public.addresses WHERE activate_change_request_id = change_request_id_goes_here AND NOT unit_num IS NULL;
Count and view all new base 'public.addresses' records associated with the change request.
Count and view new unit addresses associated with the change requestCode Block language sql -- Count all new base addresses SELECT COUNT (*) FROM public.addresses WHERE activate_change_request_id = change_request_id_goes_here AND unit_num IS NULL ; -- View all new base addresses SELECT * FROM public.addresses WHERE activate_change_request_id = change_request_id_goes_here AND unit_num IS NULL.
_goes_here;
Count and view new 'public.address_base' records added to the EAS. Cross-check with the 'bulkloader' schema as shown above in Evaluate 'bulkloader' schema.
Code Block language sql --- Count new unit addresses'public.address_base' records SELECT COUNT (*) FROM public.addresses WHERE address_base, public.addresses WHERE public.address_base.address_base_id = public.addresses.address_base_id AND public.addresses.address_base_flg = TRUE AND public.addresses.activate_change_request_id = change_request_id_goes_here AND; NOT unit_num IS NULL -- View new unit addresses'public.address_base' records SELECT activate_change_request_id, address_id, public.address_base.* FROM public.address_base, public.addresses WHERE activate_change_requestpublic.address_base.address_base_id = change_requestpublic.addresses.address_base_id_goes_here AND NOT unit_num IS NULL
Count and view all new addresses associated with the change request.
Code Block language sql -- Count all new addresses SELECT COUNT (*) FROM public.addresses WHERE activate_change_request_id = change_request_id_goes_here -- View all new addresses SELECT COUNT (*) FROM public.addresses WHERE activate_change_request_id = change_request_id_goes_herepublic.addresses.address_base_flg = TRUE AND public.addresses.activate_change_request_id = change_request_id_goes_here;
Optional: count all records in the database for use in a before/after record count comparison.
Code Block language sql SELECT schemaname,relname,n_live_tup FROM pg_stat_user_tables ORDER BY schemaname,relname,n_live_tup;
Free space after large Bulk Loader batches
After successful loading and evaluating large batches of addresses, clean up the database by truncating and vacuuming the the Bulk Loader output table (bulkloader.address_extract
) and vacuuming the database.tables.
Code Block | ||
---|---|---|
| ||
TRUNCATE bulkloader.address_extract, bulkloader.blocks_nearest; VACUUM FULL ANALYZE bulkloader.address_extract; VACUUM FULL ANALYZE bulkloader.blocks_nearest; |