SUGI 26 Summary

Jack Hamilton
First Health
West Sacramento, California 95605
JackHamilton@FirstHealth.com

Table of Contents

Long Beach
Opening Session
Management Changes
Version 9
What's New in ODS
SAS/Intrnet
XML
The Futures Forum
Documentation
The SAS-L BOF
PROC REPORT
JDMS
OS/390
Things I Want
Hands-On Workshops
DataFlux
Next SUGI's

Long Beach

The Convention Center worked well. Despite the large number of people, I never felt crowded.

The main hotel (the Hyatt) wasn't old enough to be historic, or new enough to be modern, but the elevators were fast (the restaurant, on the other hand, was very slow). There will be a new Marriott next time we're there.

There were lots of good restaurants near the Convention Center.

SAS Institute sponsored some good evening events at the Aquarium of the Pacific <http://www.aquariumofpacific.org/> and the Queen Mary <http://www.queenmary.com/>. The Aquarium was fascinating - I think I could stare at jellyfish for hours. The QM was also very interesting, a relic of bygone years.

Overall, SUGI was well run and in a good location.

Opening Session

This was the largest SUGI ever, with over 3,600 attendees.

There was a good laser show and a closing dance number with singing chorus boys and girls. The entertainment value was high, even though the words to the closing song, "The Power To Know", are not likely to win any prizes

Unfortunately, in terms in information content, the opening session was not good. One veteran attendee told me that it was the worst opening in 15 years. Other comments included "disappointing", "insulting", and "worthless". I'm not sure what happened. One suggestion was that SAS Institute was trying to impress the handful of financial analysts in the audience, at the expense of the 2,000+ actual paying users who expected useful information. We'll know better next year.

Two SAS acquisitions, a data cleanser and a campaign manager, were mentioned, but there wasn't much hard information about them. I'll describe the data cleanser, DataFlux, in more detail later. The campaign manager (which is used to create target mailings for businesses, not to manage political campaigns) featured a truly unfortunate example showing how a company could use SAS Software to send spam to cell phones.

This was the umpity-umpth year of double-digit revenue increases for SAS Institute. I would not be surprised if this was also the umpity-umpth year of rate increases.

Management Changes

Barrett Joyner's gone. Dr. Goodnight stepped down as COO to be replaced by Andre Boisvert, who had previously worked at Wang, Oracle, and IBM. Then Mr. Boisvert resigned and Dr. Goodnight took over again. The new president of SAS for the Americas (wasn't this Barrett's old job?) resigned somewhere along the way. In other words, management confusion.

Nobody's talking (much), but I think they're trying to find a way to reward employees, to establish a value for the company, and to ease the transition from Dr. Goodnight. Think of the consequences of a sudden transition, such as a fire sale in which SAS Institute is acquired by Computer Associates. However much customers and employees might dislike the idea of changing an establishment which seems to be very well run, change will happen someday, and it's better for it to be planned in advance.

SAS doesn't give premium salaries, but is one of the best places in the United States to work. Employees are concerned that benefits might be cut as a cost-saving measure when the company goes public. Any action that results in increased employee turnover will probably result in decreased quality in SAS products and services.

Another possible consequence of going public (and resulting pressure for decreased expenses) is the possibility that the R&D budget will be reduced. SAS Institute has always stayed near the leading edge of data management and statistical analysis by spending a high percentage of its budget on research and development.

Drs. Goodnight and Sall will keep a majority interest in the company when it goes public, so it's unlikely that anything drastic will happen. In any case, the IPO is still probably a year to a year and a half away.

The management changes don't seem to have affected services so far - tech support is still excellent, new products are being announced, and old products are (for the most part) being enhanced.

The marketing message seems a bit muddled. Does SAS Institute want to retain its traditional base as a proprietary system that's powerful and easy to use, or does it want to embrace Java as its new code base? Does it want to be a software company, or does it want to be a consulting company? It's hard to tell.

Version 9

They're already talking about version 9, also known as Project Mercury. The feature set isn't firm yet, but

The driving forces behind version 9 are the increasing volumes of data we're seeing and the availability of multiprocessing hardware.

Version 9 will be out in beta by the end of the year, and production by the next SUGI.

What's New in ODS

ODS MARKUP

ODS MARKUP provides a way to define "tagsets", basically any kind of markup you want to create. This will let you create your own HTML or XML variants. You can even create data in non-markup formats. One use I was thinking of is the creation of input files for Palm applications. Such files are typically highly structured, but are not "markup" in the way HTML and XML are markup.

The important thing to remember about ODS MARKUP is that it creates markup, not layout. It knows about content, but it doesn't know about things like character size.

MARKUP is there but experimental in 8.2 (it's part of PROC TEMPLATE). Documentation will be on the web "soon". I've seen the syntax - someone was reading Perl manuals instead of SAS manuals when they designed it, but it's usable (the introduction of new programming structures by SAS developers who apparently don't use the data step language is not, unfortunately, new to 8.2).

There's an even more experimental markup engine for graphics markup, so it will be possible to create your own VRML, VML, or whatever. This is further in the future. Browser implementation of vector graphics markup languages is neither widespread nor consistent at the moment, so I don't think we'll see these drivers formally offered by SAS in the immediate future. I didn't ask about this at SUGI, but creating Flash files is another future possibility now that the format is public.

ODS DOCUMENT

ODS DOCUMENT lets you store device-independent ODS output for later use - no need to recreate the output, just play it back. This is somewhat similar to creating graphics catalog entries with SAS/GRAPH and playing them back with PROC GREPLAY.

What makes this really useful is that you can do it under program control. For example, you could combine all the "Kentucky" reports into a single continuous document, even though the parts were created by various PROC's with BY statements.

DOCUMENT also removes the necessity of keeping a copy of all your data if you might want to reproduce a report later, for an audit or just to a different output device.

DOCUMENT is experimental in 8.2; documentation will be on the web "soon".

By the way, R&D documentation is pushed to the web on Friday evenings, so you don't need to check every day to see if something new has appeared. Just check on Monday morning.

New HTML/XML Formats

Various new HTML/XML variants are built in, including PHTML to eliminate all the junk attributes SAS thinks you want despite your best efforts to get rid of it.

Better PDF Handling

8.2 is able to handle graphics in PDF, at least in some cases. In the past, you could put a company logo on every page if you were using the HTML destination, but not if you were using the PDF destination. Now you can.

You can create PDF files without a table of contents column using the NoToC option.

PDF links still don't work right in 8.2, but if you use the PDFMARK option, create PostScript, and Distill it, you'll get links.

ODS LAYOUT

In version 9, there will be something called ODS LAYOUT. It will let you define page layout, a feature which many other report generators have had for years.

LAYOUT will give you the ability to this:

Some text here that goes all the way across the page in a single column.
A table
ac
bd
Some text in a smaller column
Centered graph

We can do that now with HTML (and the appropriate browser), but not with RTF or PDF. LAYOUT will give us a way to specify much more complex report layouts.

SAS/Intrnet

htmSQL won't get much more than bug fixes and minor enhancements. There will probably be a a maintenance release with some enhancements later this year. In the long run, SAS wants us to switch to Java Server Pages (JSP).

Although JSP sounds like we'll have to learn Java to write simple web pages, it's not quite that bad. JSP is more like Cold Fusion. You can specify simple Java code in your HTML source, and the server interprets it. You can do loops, text substitution, all the standard programming language stuff. Some servers let you use JavaScript rather than Java, so you can use the same language on the server that you use in client-side scripts. You can find more information on SAS and JSP at <http://www.sas.com/usergroups/sugi/sugi26/presentations/p178.zip>. SAS Institute is already using JSP on its web site - not everywhere, but I've noticed a few .jsp pages flash by.

The main power in JSP is in tagsets provided by vendors. SAS plans to provide us with custom tags that we can use in JSP pages to grab SAS data. How well this will work depends, of course, on the tags they provide us. I don't think much movement will happen on this before next year.

If you're thinking about SAS/Intrnet, look at David Ward's Onyx product. It uses SAS as its scripting language, so it's for SAS users, not Java programmers. See <http://www.libname.com> for more information.

Many people would like a better way to manage share and broker sessions, especially hung sessions. A management console app is under consideration, but I didn't get the impression I should hold my breath waiting for it. 8.2 has a broker administration page reachable through the broker URL with no parameters.

XML

(Some of the technical information here comes from a talk by Paul Kent; you can see it at <http://www.sas.com/usergroups/sugi/sugi26/presentations/xmlatsas.zip>).

SAS is making a big push for XML. I don't like all of their buzzword initiatives (Java, CRM, SRM), but I do like this one.

XML is for programs like HTML is for humans. Just as an HTML file with content markup is much easier for a human to read (after it's been displayed by a browser), so is XML easier for a program to read (after it's gone through a parser).

XML also gets you out of the file layout business by specifying content with tags and attributes rather than by columns. Everything is printable.

XLM looks a lot like HTML, but it follows tighter rules. Here's a possible XML snippet:


<customer>
   <name>John Smith</name>
   <order>
      <product shipped="y">Sprocket</product>
      <quantity>12</product>
   </order>
   <order>
      <product shipped="n">Geegaw</product>
      <color>Green</color>
      <quantity>9</product>
   </order>
</customer>

One of the big benefits of XML is a consequence of its pure text format and strict adherence to predefined rules: it can be processed by many tools in many languages on many platforms. It really does have the potential to become the standard mechanism for data exchange.

Two important things to notice are that all content is associated with a descriptive tag - the data are to some extent self-describing - and that the data can be arranged hierarchically rather than relationally. Traditional SAS datasets - and databases from vendors such as Oracle - have been strictly relational.

How will SAS Use XML?

In the long run (possibly not 8.2, probably not everything in 9), SAS Institute hopes to make it possible to:

Benefits to SAS

So what, you ask, does SAS Institute get out of this?

XML Export

I've already mentioned ODS MARKUP, which lets you use PROC TEMPLATE to define your own markup.

Predefined destinations include the presentation-centric Docbook.

Someday, there will be a SASdoc XML format to describe SAS data. This is still being refined - it's not as straightforward as you might think.

XML Import

Currently, SAS can use the XML engine only to import documents that are formatted exactly as SAS wants them (you might, of course, be able to use XSLT to generate, outside of SAS, that exact format). There's a discussion of this at <http://www.sas.com/rnd/base/topics/sxle82/>, Why Can't SAS Import My XML Document?

.

Coming soon, experimentally, is XMLMAP, which describes the content of the XML file to be imported. An XMLMAP file is itself written in XML.

Currently, there's no event driven model. It's under consideration.

The Futures Forum

The Futures Forum is our formal opportunity to talk to the heads of the SAS technical development teams and find out their plans.

In past years, the forum was lead by Barrett Joyner and lively discussions ensued, but with Barrett gone a different format was used this year - instead of presentations by the developers explaining what they'd done and what they plan to do, followed by a question and answer session, we had only the Q&A.

I don't think this new format worked as well. Although we in the audience had enough time to ask all the questions we wanted, but we didn't get an overview of where SAS software is headed, and I missed that. The "state of the union" format might return next year.

I've incorporated some of what I heard at the Futures Forum into other parts of this document, and some information from other sessions is included here. Not everything that was mentioned at the forum is mentioned here, of course.

Documentation

There was a lot of discussion of documention at the Futures Forum. Many people didn't seem to be happy about some aspect or other of the new documentation - information is out of date, misleading, hard to find, or simply not there.

The conversion of existing documentation from V6 format to V8 format was such a massive effort that they haven't had time to review everything for current relevance. They're working on it.

If you have suggestions for improving the documentation, please send them in.

One problem with the online documentation for 8.x is that it hasn't been reliably updated with information from minor versions - so stuff that's new in 8.2 is documented only in the Changes and Enhancements manual or in system help. This was one of the problems that moving to the new documentation system was supposed to fix. It will be made right in version 9.

At the Futures Forum, someone asked why the search mechanism in documentation defaults to an OR condition rather than an AND condition. The reply was they they thought people would want OR. A show of hands revealed that the audience was about 99% in favor of AND and 1% in favor of OR. So that might get changed.

Version 9 will have links between Help and OnlineDoc, and it will be possible to load only parts of the documentation.

On Unix and OpenVMS systems, when you use Help, it's opened in a browser window, but before the document finally appears there's a long delay while the browser opens, loads a Java app, opens a new browser window, closes the first browser window, and adjusts the window to be harder to read (that last part isn't the intent, but if you don't have a huge monitor it's the effect). It can take over a minute for the help screen to appear. They'll consider streamlining this process in a future release.

The SAS-L BOF

I was one of the instigators of the SAS-L BOF, which was once more held as a RAL. For the acronym-impaired, SAS doesn't stand for anything, and SAS-L is an electronic mail list for SAS discussions. A BOF is a birds-of-a-feather session for people with similar interests, and an RAL is a random access lunch. BOF's are usually held in the evening, but we decided last year to go for a lunch meeting instead. It seemed to me that attendance was down from last year.

Our guest speaker from SAS Institute, Dave Brumett, was unable to attend due to illness, and it took four other SAS employees to take his place. We found out that

PROC REPORT

_col_ is misnamed. It actually applies to a cell, not a column. This makes it useful for dynamically changing the contents or formatting of a cell.

Ability to sort and group on computed variables? Maybe someday, but it requires two passes through the data, which they don't want to do (better them than me, I say).

The ability create multiple panels when the ODS destination is HTML probably won't come soon.

JDMS

Another new feature, JDMS, is part of the base product! It gives you a display manager in your browser. It's called JDMS because it's a Display Manager System written in Java.

It was created in response to users who don't want to run X over dial-up lines (which is like waiting for paint to dry), or who just don't want to use X displays for whatever reason.

It shows only the program, log, and list windows - no AF or graphics. It also lets you view tables (with an optional WHERE clause).

JDMS is supposedly included in the installation disks (I haven't seen them yet, so I can't give details). It requires a web browser on the host machine, or a web browser and SAS/Intrnet on a different server machine.

OS/390

CEDA (Cross Environment Data Access) allows a SAS program running on OS/390 to read and write SAS data sets on other platforms, such as Windows. This may require other OS products such as NFS.

Universal Print allows an OS/390 program to create high quality output using ODS to write to a LAN printer. This may require other OS products to provide LAN connectivity.

There's a new facility called PASSTICKET which lets you establish a TCP/IP connection without hard-coding a password in your SAS job.

There's support for a piping product which allows you to pass data directly from one batch job to another, allowing some overlap in processing time.

There's a new interface to the DB2 batch load facility which is much faster than the previous one.

In the long run, OS/390 is going away. The future seems to be in z/OS.

For more details, see <http://www.sas.com/usergroups/sugi/sugi26/presentations/os390.zip>.

Things I Want

SUGI is my opportunity to talk to developers and tell that what I want. I didn't have much to add this year - they've given me some of what I want (JDMS, various ODS improvements), and it doesn't seem likely that I'll ever get the other things I want (stored functions and formats written in the data step language, a good set of regular expression functions). But I managed to want a few things. If you want them as well, tell your account rep, or mention it to Tech Support next time you're on the phone, or send a note to suggest@sas.com. Those items not discussed elsewhere are:

A ZIP filename engine

A very common question on SAS-L is "Can I read a ZIPed file from SAS?" There's no built-in way, but it's been discussed. Reading wouldn't be too difficult, but there are details to be worked out, such as how to get a contents listing. One possibility would be through a special keyword that causes the fileref to return a list of member names rather than the contents of a member; another possibility would be to extend the data set information functions to cover ZIP directories.

Dynamically generated WHERE

This would give you the ability to change the active where clause based on your data. The problem that brought this up is a database join that's too large to be created for all keys.

If you think this would be useful, let SAS know - suggest@sas.com .

There's a workaround using data step functions, but it's messy. If there's interest, I might do a paper on it next year.

Easier SQL Syntax for joins on multiple keys

I'd like to be able to code:


select  a.*
from    outer 
where   (a.x, a.y) in (select x, y from inner);

The same effect can be achieved in other ways, but I think this syntax is much more straightforward.

A Demo Version of SAS

SAS Institute is doing a poor job of helping newcomers learn SAS, or of encouraging current users to explore new products.

Compared to its statistics competition (SPSS, S-Plus), its database competition (Oracle, DB2), and its web competition (ColdFusion, PHP), or even its programming language competition (Java, Visual Basic), SAS Institute is stuck in the 60's. All of its competitors offer a free or low-cost way to learn the language and try out its features.

That's just not possible with SAS. SAS-L sees a question every few weeks from someone who wants to learn SAS. What can we tell them? "Enroll at a college where you can take a class, or find an employer who will let you use their license, or spend thousands of dollars for your own copy!" If someone wanted to learn Oracle or Java, we could say "Go download a mildly restricted version, free of charge", or in the case of Visual Basic, "Go to the store and buy a copy of the learning edition for around $100". Which answer is more likely to encourage new users of a language, that of SAS Institute or that of its competition?

Another problem with SAS's current distribution method is the difficulty of trying new products. Suppose I think some new product is the bee's knees and would save lots of work for me and other people in my company. Can I get a copy and try it out? Not easily. Even assuming that the company I work for allows the installation of demo software (not all companies do), and even assuming the full and speedy cooperation of the IS and purchasing departments (not bloody likely in my experience), by the time a new product is installed the evaluation period will be almost over, or some other project will have come up that doesn't let me get back to the new SAS product immediately.

What people in this situation need is a way to install a full version of SAS on their own machines or laptops to play with at their leisure.

Ian Whitlock suggested a demo version of SAS - full featured, all products, so that people who want to learn SAS can do so. It would be limited in some significant way - maximum of 1,000 obs, perhaps - but could otherwise take advantage of every feature in SAS. This would allow new users to learn SAS, and experienced users to try new features.

Tell your sales rep if you want this. I doubt if they'd lose any sales in the short run, and there's a potential for increasing the demand for SAS over the long run.

Hands-On Workshops

I didn't make it to as many sessions as usual this year because I didn't hear that there would not be printed copies of the proceedings until after I had volunteered to work as a session coordinator at two Hands On Workshops sessions. Between those, doing three presentations, and hanging out in the Demo Room, I didn't have much time left for actual sessions.

The sessions I attended were:

Regional meetings such as WUSS also have Hands-On Workshops, and I highly recommend that you take one, especially if you want to learn about a product that you've never had the chance to use.

Handouts are available until September 1, 2001 at <http://www.sas.com/usergroups/sugi/sugi26/hands-on/>.

DataFlux

DataFlux is one of SAS Institute's recent purchases. It's a data cleaning tool, and its official name is SAS® Data Quality - Cleanse (so you can see why I call it DataFlux). It's actually available in two forms, as a standalone Windows application (dfPower) and as a set of SAS procs and functions. The SAS product is sold both separately and as part of a SAS/Warehouse Administrator package. It's available for Windows, OS/390, and Unix platforms, but not for OpenVMS.

I don't have all the details on what it can do, but two of the major features are field standardization and record matching.

Field cleanup looks at individual fields and standardizes them in some way. For example, if "001.0" is a valid IDC9 code, and an input record contains "001" or "001.00", the field would be changed to the correct value.

Record matching looks at input records and decides which ones belong together even though the match fields aren't exactly the same. For example, it might decide that "Ron Richards, 123 Main Street", "Ronald Richards, 123 Main St.", and "W. Ronald Richards, 123 Main" all refer to the same person, and it would assign a match code which could be used in later processing.

In my company, two possible uses for record matching would be provider matching and interim billing. Provider matching is needed when a potential client wants to know which doctors currently used by its employees are in our network (employees whose doctors aren't in the network may have to change physicians, which people are often, naturally, reluctant to do). Interim billing is needed when we get a number of hospital claims records and need to decide which ones constitute a single hospital admission. DataFlux can't do the actual interim billing, but it could greatly simplify the process.

Unfortunately, SAS® Data Quality - Cleanse is unattractively priced. It would cost us less to hire a full-time programmer for a year to work on nothing but match code than to license DataFlux - and at the end of the year we wouldn't have to relicense the code. (Another possibility would be to obtain Charles Patridge's fuzzy matching code, which is substantially less expensive.)

Upcoming SUGI's

The next SUGI will be April 14-17, 2002 in Orlando at Disney World.

SUGI 28 will be in Seattle.

SUGI 29 will be the first SUGI held outside the US, in Montreal.