home   about us   contact us  
Search site for:

Importing Directly From a Text File

Instead of defining a consecutive file and then reading it, it is possible to read and parse text files directly in APPX. First, here is the actual code.

Note 1
This code simply opens the text file for input. The code is exactly the same code that is generated when you use the 'Generate Delimited Update' option in the Data Dictionary Toolbox, except instead of passing a 'w' (for write) in CDF TYPE, we pass a 'r' (for read).

We suggest: Note the test for a RETURN CODE of +1. This tells us we were able to actually open the file. Anything else means a failure of some kind, it could be as simple as a misspelled file name, or a permissions problem. Remember, you are executing as user APPX, and it must have permissions to access the text file.

Note 2
This next section of code actually reads the text file. We pass it 3 parameters, an alpha field to contain the data, a numeric field where we indicate the maximum length we can accept, and the file pointer that was returned to us when we opened the file. The parameters must be passed in that order.

We suggest: Note that we initialize the alpha and numeric field before every call to the RT_READ_STREAM routine. This is necessary for 2 reasons: if a subsequent read returns less data, the old data may still be in the alpha field, and secondly, the RT_READ_STREAM uses the numeric field for it's own nefarious purposes therefore we have to set it every time. Also note our test for RETURN CODE equal to +1. Just as with the RT_OPEN_STREAM, this indicates a successful read. Anything else means we have read all the data in the file and we can quit.

Note 3
In this section, we parse the TEMP 256 alpha field to separate all the fields. Our example is written for a tab-delimited input file. The WORK ARRAY is an alpha work field, 128 characters in length with 30 occurrences. This routine will parse the fields out of TEMP 256 and return the individual fields in the WORK ARRAY field for processing by the MOVE TO OUTPUT routine.

We suggest: Tab delimited fields are the easiest to work with, and the fastest to process, as we can use the IN condition of the IF statement to look for separators. We don't have to know how long each field is, the Tab will tell us (so long as it's within the max. length for WORK ARRAY, we are OK). If you have to process a comma delimited file, that will be a lot slower, as you have to examine each character to see if it is a comma, and if so, is it in the middle of some quotes, in which case you ignore it because its part of the field data.

The first thing we do here is to initialize our WORK ARRAY field, just in case some lines contain fewer fields than other lines. This is just good programming practice. The next thing is to strip off the trailing binary data from TEMP 256. When APPX returns the data from the text file, it includes the CR/LF characters, or just a LF character, and a null (Hex '00') at the end of each line. Normally, under Unix the file would only have a LF/Hex 00 at the end, and under Windows it would have a CR/LF/Hex 00 at the end. However, since we don't know what platform we are on, we should check for all three, and remove whatever we find. Even if we knew what platform we were on, we still don't know where the file came from, we could be processing a CR/LF terminated file under Unix, or a LF terminated file under DOS/Windows.

We wrote these tests as 3 complete sections of code, we realize that this could have been turned into another subroutine. We did it this way for clarity. The next part of the program just looks for the Tab character (Hex 09), and moves each field to a separate occurrence of WORK ARRAY. We use TEMP 512 as a scratch pad to temporarily hold the contents of TEMP 256 as we manipulate it. Note that we are 'consuming' TEMP 256 as we go (i.e. once we find the first field, we remove it from TEMP 256). When there are no more Tabs in TEMP 256, then we have parsed all the fields, and whatever is left in TEMP 256 is the last field. This is simpler than trying to keep track of which fields have already been processed, and allows us to use the fastest technique for finding field separators (the IN condition). Also notice how we always set TEMP 512 to blank before using it in a SET TEMP statement. This ensures that data from the previous field will not carry over to the next field.

You can download the above code by clicking here for any Intel based platform, or here for HP/AIX platforms.

To install the example code, define a new application SAS/00 in the APPX System Administration files, and then create the design files. Move these files to the $APPXPATH/00/SAS/Data directory and uncompress them (PKZIP for Intel, uncompress/tar for HP/AIX).

Do you have a tip you want to pass on? Contact Us.

« Return

© Copyright 2009 - C.A.N.S.Y.S. West Limited All Rights Reserved