EXAMPLE
--------------
We have climate change data for different countries. The file raw_Annual_Surface_Temperature_Change.csv has the data for surface temperature.
We also have data for countries. The file raw_countries.csv has the data for the countries.

Now, the issue is that the measurements file 
(a) uses ISO2 and ISO3 as identifiers for the countries. Also it contains the country name as a string.
     Instead of all these, we want ISO_code as the identifier for countries (we always like an integer for a PK)
(b) has several columns we do not want to retain for further processing, e.g., Indicator,Unit	Source,CTS_Code,CTS_Name,CTS_Full_Descriptor


So, we want to join the incoming temperature changes with the countries, drop the unwanted columns and retain iso_code as the identifier for countries.
How do we achieve this? We join the two inputs. To the extent that PDI gives us a sort-merge join, we sort the two inputs on the join field, namely ISO3, and then join.
/* The sorted intermediate files are the srt_* files */
The output is passed for storage to a new file, result_join.txt.

Then, the result is normalized and stored in an result_pivot.txt file: instead of having o(50) columns with the measurements, one per year, we want to produce a file where each row has
 - iso_code for the country
 - year for the year
 - measurement value for the measurement of this country during this year
 The PK is obviously the combination <country, year>
 
Notes:
We also use tab as a separator, to avoid confusions with comma or semicolon in the csv fiels. We also add UTF-8 encoding. See the properties of the transformations.
We intentionally keep country and iso3 in the join output file for debugging purposes -- unbder normal circumstances, we would have eliminated these two attributes too.
Also, due to the existence of strange strings that contain commata, e.g., "Afganistan, islamic republic of", it is useful to retain the " enclosers (or, otherwise, the data become a mess)

