Research interests



I am providing this information for potential students who wish to work with me and/or for the merely curious. This is written with a historical bent to it, and I realize engineers hate history (you were warned!).

History

I have been an active computer architecture researcher since ca. 1986.  I began with strong interests in supercomputing and worked some on the problem of modeling the latencies through the multistage interconnection network of the then-new Cedar clustered supercomputer built at the University of Illinois at Urbana-Champaign.

Engineering is most interesting to me when you must fulfill very tight design constraints.  Supercomputers were defined by David Kuck as "the most expensive computers in any given generation."  So clearly, they are not cost constrained.  The constraints on supercomputers tend towards the limits of technology, especially the speed of the underlyinig devices and the methods to keep those devices within their operating temperature range.   I'm peripherally interested in device technology, but do not seek to research that area.

In 1988 I concluded that more cost-constrained systems were more interesting to research.  At that time, the "middle market" for computing-- the midway point between the cheapest computing systems (PCs) and the most expensive (supercomputers)-- was engineering workstations.  The research I worked on at Illinois while a graduate student delt with the problem of architecting these systems under tight constraints.

Here is the introductory statement from my Ph.D. thesis (1992), which both illustrates my worldview back then and also holds up pretty well today:

It is difficult to design successful workstation architectures since these systems are general-purpose and used for a large number of very diverse tasks. Performance of a workstation can be decomposed into that of its components, such as the network,  graphics hardware, I/O, processor and memory subsystems. Successful design of these components requires careful consideration of the workload of workstation users. This workload is large and diverse.  Current methods for designing these components are iterative processes that are not well-suited to large, diverse workloads. This thesis addresses this problem by developing a systematic method to synthesize prototype architectures of workstation components from large workloads.  The thesis focuses on the processor and the memory components, although the overall approach can be applied to the evaluation of other workstation components as well.

[...] It is clear that there are considerable problems with the current architectural design process.  What is needed is a systematic approach that can take workload elements such as industry-standard benchmarks or end-user supplied applications and derive architecture prototypes from them.  The advantages of such a computer architecture prototyping technique would be:

  1. The architectural design process would begin with a prototype that is already influenced by diverse workloads,
  2. Since the architectural design process is iterative, improving the quality of the initial guess also improves the rate of convergence of the overall design process, reducing the number of iterations and enlarging the usable workload size.
In other words, design is iterative.  You pick a starting point and refine it.  You only have a certain amount of time to come up with a design.  So my goal was to come up with a better way to design workstations, and the "better way" hinged on this idea of getting to a better starting point, and that I called a "prototype architecture".

I still believe this is the right way to design complex systems.  The majority of my early career back then focused on ways to come up with prototypes efficiently.  I continued this work with my student Kishore Menezes, and he and I published what is regarded as the definitive method of sampling processor simulations to speed them up:

T. M. Conte, M. A. Hirsch, and K. N. Menezes, "Reducing state loss for effective trace sampling of superscalar processors," Proceedings of the 1996 International Conference on Computer Design, (Austin, TX), Oct. 1996.

But in the interim, it became clear to me that the workload as viewed as immutable was a flawed view.  The compiler had an equal role to play.  My students and I started to define the compiler's role and quickly discovered the VLIW view of the world.  Compiler-driven microarchitecture seemed to us to be much more open to design tradeoffs.  I was a researcher, so "dusty deck" code compatibility issues did not interest me.  Superscalar approaches seemed stalled (this was the early '90s) at dual issue or perhaps four issue machines.

We began architecting a framework to reason about compiler-driven microarchitecture.  In 1994, I was getting married.  My best man was also my former office mate and co-advisee from Illinois (Sadun Anik).  He brought with him to the wedding a technical report from his research group at HP Labs on a framework for investigating compiler-driven microarchitecture called "PlayDoh".  We scrapped our fledgeling work on our framework and adopted the PlayDoh semantics.

PlayDoh was interesting but it was far from implementable.  My students and I decided that there was interesting work to do on how to make a practical VLIW that would be a commercial success.  We devised an encoding (Sergei Larin did this for his MS thesis).  We began looking at the problem of cross-generation code compatibility in VLIW, what was then an oft-cited Achilles Heel of the technology-- binaries from one generation of the same encoded ISA in a VLIW were not guaranteed to run correctly on a later version of that ISA.  My Ph.D. student Sumedh Sathaye and I devised a way to combine the encoding of the ISA, the compiler, the page fault handler in the OS, and (ultimately with Kishore Menezes and Sanjeev Banerjia's input) the instruction cache design to provide cross-generation compatibility:

T. M. Conte and S. W. Sathaye, "Dynamic rescheduling: A technique for object code compatibility in VLIW architectures," Proceedings of the 28th Annual International Symposium on Microarchitecture, (Ann Arbor, MI), Nov. 1995.

It was affirming to see that companies such as Transmeta essentially adopted this same approach.  Sumedh Sathaye realized the potential of the technology was greater than just VLIW compatibility and so he set out to design what we called an "evolutionary compiler," and what most people today refer to as a dynamic optimizer.  After he received his Ph.D., Sumedh went on to work on dynamic optimization as part of a very influential research group in IBM T. J. Watson Research Center that designed the now-classic dynamic optimization systems DAISY and  TULIP.

Kishore Menezes as a Ph.D. student, and Burzin Patel as an M.S. student, worked on a related problem, that profile-driven optimization-- so very important for VLIW compilation-- was impractical in a commercial setting.  He published several papers on what we referred to then as "hardware based profiling."  Kishore had an influence on what ultimately went into the Itanium-I performance counters, which are now quite appropriate for profiling code with zero slow down.

Since that time in the mid to late `90s, my students and I stayed on the path of solving the supposidly unsolvable barriers for compiler-driven technology by employing every aspect of a computer system-- from the hardware up through the OS.

At the same time, workstations were no longer the middle market.  It is relatively easy to argue that even PCs were no longer the middle market, either.  That unique feature set of a heavily constrained engineering problem migrated down to the smallest devices, the so-called embedded systems.  These systems had (indeed, still have) crushingly tight power, cost, form factor, performance, etc., constraints.  To that end, we researched architectural problems in embedded system design.  But the flavor of the research was the same as it was when we began a decade prior.

Fast forward to today

A funny thing happened on the way to the Forum: Moore's Law hit a hiccup. Note there are truly two Moore's laws. The first is what Gordon Moore was trying to express, namely that the number of transistors that are feasibly fabricatable on a monolithic device tends to double at approximately every 18 months (this constant itself has varied over time between every 24 months and every 12 months). The "other" Moore's law is that computer performance of a single-threaded application (i.e., uniprocessors) doubles at the same rate-- every 18 months. The latter is more of a self-fulfilling prophecy. If you think your competitors will double performance over the next 18 months, you work as hard as you can to keep up!

So why did this second law end? It ended because we hit a power density limit: we cannot build ever-larger microarchitectures without exceeding the power envelopes of modern forced-air-cooled packages. This has placed a new, and in my view, wonderful constraint on architecture design! The answer has been a side step. Instead of relying on single-thread performance scaling, the industry has taken a gamble on multi-threaded performance and built monolithic multiprocessors on a chip. Unfortunately, this is only a partial answer. Parallel programs are hard to write, and harder to deal with architecturally, than orthodox single-threaded programs.

There have emerged two flavors of multiprocessors on a chip, the multi-cores and the manycores. It is something like the battle royal between superscalar and VLIW in the just past generation of computer architecture. Manycore is clearly the underdog, and so that is where we are focusing our energies. My students and I are working on all aspects of making thousands of processors on a chip (so-called "kilocore-scale manycores") programmable, coherent and genuinely useful.

What's interesting to me is that the history I have with supercomputing in the 1980's is now paying dividends. "Everything old is new again," I suppose. A lesson for any student here is to never dismiss any area of computer architecture as "boring" or "old hat." You'll be surprised what you will find is "hot" and interesting during the long arc of your career!

Also it is important to note that the original goal of systematic computer architecture prototyping using fast simulation techniques hasn't been lost.  It is ever the more important in manycore design where the new challenges are modeling and simulation of vast numbers of parallel threads.