The AVS addresses are imported into EAS via a one time ETL.
Once EAS goes live, this ETL will be turned off.
Here I describe in general the processing that occurs during the ETL.You can see the most of the code at these URLs
The ETL process is controlled by a python job which is here (jira access required)
If you do not have access, the important code is here
Code Block |
---|
job.commands = [
# load the data
FtpCommand(job=job, connectionGroup=self.connectionOrganizer.ftpConnections['DBI2SFGIS'], sourcePath='/dbi-external/tblAVS.txt', targetPath=csvFileIn, direction='GET'),
FmeCommand(job=job, fmw="avs_load/avs_to_eas.fmw", connectionGroups=(self.connectionGroups['MAD_ETL'],), useSdePorts=(False,), files=(csvFileIn,)),
FmeCommand(job=job, fmw="avs_load/parcels_sfgisStg_no_geom_to_eas.fmw", connectionGroups=(self.connectionGroups['SFGIS_STG_SDE'], self.connectionGroups['MAD_ETL'],), useSdePorts=(True,False,)),
SqlCommand(job=job, sqlStatement="select avs.delete_addresses();", connectionGroup=self.connectionGroups['MAD_ETL']),
SqlCommand(job=job, sqlStatement="vacuum analyze;", connectionGroup=self.connectionGroups['MAD_ETL'], useTransaction=False),
SqlCommand(job=job, sqlStatement="select avs.init_parcel_points();", connectionGroup=self.connectionGroups['MAD_ETL']),
SqlCommand(job=job, sqlStatement="select avs.init_streets_nearest();", connectionGroup=self.connectionGroups['MAD_ETL']),
SqlCommand(job=job, sqlStatement="select avs.init_avs_addresses();", connectionGroup=self.connectionGroups['MAD_ETL']),
AvsLoadCommand(job=job, connectionGroup=self.connectionGroups['MAD_ETL']),
SqlCommand(job=job, sqlStatement="update avs.avs_addresses set exception_text = 'programming error - row not processed' where address_base_id is null and exception_text is null;", connectionGroup=self.connectionGroups['MAD_ETL']),
SqlCommand(job=job, sqlStatement="select public._eas_validate_addresses_after_etl('ALL');", connectionGroup=self.connectionGroups['MAD_ETL']),
SqlCommand(job=job, sqlStatement="delete from public.d_address_disposition where disposition_description = 'provisional';", connectionGroup=self.connectionGroups['MAD_ETL']),
SqlCommand(job=job, sqlStatement="vacuum analyze;", connectionGroup=self.connectionGroups['MAD_ETL'], useTransaction=False),
# report on results
RemoveDirectoryTreeCommand(job=job, targetDir=os.path.join(getDataFilePath(), 'avs_load_report')),
SqlCommand(job=job, sqlStatement="select avs.qa();", connectionGroup=self.connectionGroups['MAD_ETL']),
DumpTablesCommand(job=job, connectionGroup=self.connectionGroups['MAD_ETL'], tableNames=['avs.vw_load_summary', 'avs.vw_load_results', 'avs.qa'], targetDir=reportDir),
ZipCommand(job=job, sourceDir=os.path.join(getDataFilePath(), 'avs_load_report'), targetFile=os.path.join(getDataFilePath(), 'avs_load_report.zip')),
EmailCommand(job=job, subject='AVS Load Report', text='See attached.', attachFiles=[os.path.join(getDataFilePath(), 'avs_load_report.zip')]),
FtpCommand(job=job, connectionGroup=self.connectionOrganizer.ftpConnections['DBI2SFGIS'], sourcePath=os.path.join(reportDir, 'avs.vw_load_results.csv'), targetPath='/dbi-external/avsEasLoadResults.csv', direction='PUT')
]
|
This python code calls into a DB proc (avs.load) which then does the heavy lifting.
The majority of the other code is here
with these being the key pieces
- http://code.google.com/p/eas/source/browse/trunk/etl/sql/avs_load/f_init_avs_addresses_ddl.sql
- http://code.google.com/p/eas/source/browse/trunk/etl/sql/avs_load/f_load_ddl.sql
- http://code.google.com/p/eas/source/browse/trunk/etl/src/commands.py#1799/sql/avs_load/f_process_address_base_ddl.sql
- http://code.google.com/p/eas/source/browse/trunk/etl/sql/avs_load/f_process_address_unit_ddl.sql
- http://code.google.com/p/eas/source/browse/#svn%2Ftrunk%2Fetl%2Fsql%2Favs_load
...
- trunk/etl/sql/avs_load/f_process_address_parcel_link_ddl.sql
- http://code.google.com/p/eas/source/browse/trunk/etl/src/commands.py#1799
The execution path is not trivial .In any case here I so I'll walk through the process mostly in english (below).
I will try to call out the places where I have to generalize By necessity I do a lot of generalizing but I also link to the code.
In case you are curious, the key parts of the EAS data model are here
OK, off we go...
We start by doing to blanket validations and standardizing some values.
This is done in this db proc:
...
If make it past finding or creating the the ETL finds or creates a base address, we it will then insert the unit address if it is specified.
Again, we do not allow multiple active duplicates.
...
At the end of all this processing we compile results at the summary and detail detailed level and provide a QA report on the data.report results in three ways
- summary results
- detail on all AVS rows
- quality assurance on selected data
An example of this attached here: avs_load_report.zip
The summary results are shown in the table below.
These results are from 2011-12-07.
The first row represents rows from AVS that were successfully inserted from AVS.