Matlab as a Scripting Language

Dr. Dobb's Journal January 1999

A simple way to do powerful things

By Peter Webb and Gregory V. Wilson

Peter is Language Technologies Manager at The MathWorks and can be contacted at pwebb@mathworks.com. Greg is the author of Practical Parallel Programming (MIT Press, 1995) and coeditor with Paul Lu of Parallel Programming Using C++ (MIT Press, 1996). Greg can be reached at gvwilson@ interlog.com.

The numerical language Matlab has evolved to serve the same role in science and engineering that scripting languages such as Visual Basic and Perl serve in nonnumerical applications. While their concerns may differ (for example, Perl's are primarily streams, processes, and regular expressions, whereas Matlab's are matrices, scientific graphics, and numerical linear algebra), their evolution has been similar. Like Perl and its kin, Matlab has grown both from the bottom up (as its developers have incorporated cliches that they have seen programmers using repeatedly) and from the top down (as those developers have adopted techniques from other programming languages). At the same time, Matlab has, like Perl and Visual Basic, been constrained by the need for backward compatibility.

Resolving the tension between what we know now about what we should have done then, and what we can do based on what we did, is crucial to the development of programming languages. Matlab is of interest to nonnumerical programmers for two reasons.

The members of the group that designed Matlab's object-oriented features were chosen from the most experienced Matlab developers. There were no computer scientists on the language design team; all were engineers and physical scientists. This resulted in a language that, while it might look strange to computer scientists, fits the needs of physical scientists and engineers very well. Few other domains make their needs, or their desired solutions, so clear.

Matlab History

Perl has Larry Wall; Matlab has Cleve Moler. From 1976 to 1979, Moler was involved in the development of LINPACK, a collection of optimized Fortran routines for solving numerical linear algebra problems. While powerful, LINPACK on its own was no more interactive than the standard C library's string-handling functions.

Moler introduced the first version of Matlab (short for "MATrix LABoratory") to a group of Stanford University students in 1979. In its original form, it allowed scientists to chain together calls to optimized Fortran routines using an interactive text-based interpreter. This avoided the overhead of compiling or linking, just as shell scripts allow programmers to launch various UNIX tools without writing a C wrapper, and Tcl allows them to combine X Windows widgets without writing event-handling code. Moler later cofounded The MathWorks to further develop and support Matlab. Today, Moler is chief scientist and chairman at The MathWorks.

The Matlab language is currently delivered as part of an integrated computing environment that combines numeric computation, graphics and visualization, and the programming language. The current release of the software is Matlab 5.2. In addition to its standard version, The MathWorks also provides a student edition. Older, unsupported versions of Matlab are freely available in source form on the Internet; see ftp://www.KachinaTech .COM/pub/UniStation/Linux/packages/ tgz/, for instance. These older versions provide an interactive command-line interface and some linear algebra routines. Additionally, a number of Matlab-like languages have popped up over the years, including:

Like the first versions of Awk and Perl, Matlab was principally a collection of needful things -- the interpreter had a single global namespace, and paid little attention to the principle of orthogonality. Simple graph-plotting commands coexisted with sophisticated matrix inversion routines, for the simple reason that both were useful.

The first version of Matlab had only one data type -- a two-dimensional matrix of double-precision complex numbers, and a family of functions and operators to create and calculate with matrices. The language was vectorized, so that single statements such as a=b+c implicitly looped over the elements of the matrices b and c and formed their element-wise sum in a. A scalar (such as "5") was treated as a 1×1 array; vectors were either N×1 or 1×N arrays, depending on how they were to be used. The distinction between N×1 and 1×N is one sign of how differently Matlab's user community looked at computing. In programming languages such as Scheme, C++, and Perl, arrays are just contiguous sequences of values, and it is up to you to impose further structure on this representation. For example, while it is easy to select a single row from a 2D array in C, selecting a single column (whose elements are not contiguous in memory) is much more difficult. This may only be an occasional annoyance for computer scientists, but it is a major deficiency in these languages for engineers and mathematicians.

Domain-specific toolboxes were added to Matlab during the 1980s, but the basic structure of the language did not change. In the early 1990s, trends in computer science finally began to influence Matlab language designers. Users had been clamoring for true multidimensional arrays for some time, and, having decided to bite the language-extension bullet, Matlab's designers decided to make other improvements as well, including cell arrays, structures, and objects. Each of these features affected the basic structure of the Matlab language.

The largest change, though not the most significant, was the multidimensional array. Each function in Matlab had to be examined to determine how, if at all, it could handle multidimensional inputs. For example, how is matrix multiplication defined on multidimensional arrays? Simply put, it isn't -- matrix multiplication is inherently two-dimensional. But subscripting, (a(1,2), for example) is logically extended to allow more than two subscripts, and functions like sin(), sqrt(), and sort(), all of which operated elementwise on 2D matrices, were extended to operate elementwise on N-dimensional arrays as well.

Cell arrays are for constructing heterogenous collections of data. Like numerical arrays, they are multidimensional, but cell arrays are not restricted to containing only numerical data. Each cell can contain a reference to any other type of Matlab data, including another cell array. The primary operations on cell arrays are inserting a new element and extracting an existing element. A cell array is created by inserting something into it. For example, the statement a{1}=rand(4) creates the variable a, marks it as a cell array, and places a 4×4 matrix of random numbers in the first cell. Notice the use of {} instead of () for indexing: Because the first reference to any array creates that array, Matlab needs a syntactic difference in order to distinguish cell array references from numeric array references. (This is similar to Perl's use of [] and {} to distinguish array subscripts from hash-table lookups.) Users can also create cell arrays with the function cell(), but the language designers felt that since no other built-in Matlab data types require a creation function, adding the extra syntax to the language made cell arrays more intuitive to use.

Matlab structures form the basis for Matlab objects. Matlab structures are very similar to structures in C, with one major difference: They are dynamic (that is, fields can be added and removed at any point during program execution). A structure is therefore similar to the name-to-value dictionaries found in Awk and Perl.

Structures and cell arrays can both be used to organize heterogeneous collections of data. However, in practice, they are used differently. Structures have named fields, while cell arrays have numbered entries. Structures are therefore used in situations where name-value pairs are appropriate, and cell arrays where it is necessary to iterate over all the elements of a collection. Cell arrays are also used extensively in string handling; many of Matlab's built-in string handling functions, like strcmp, have been overloaded to handle cell arrays of strings.

Objects

Objects were added to Matlab primarily so that it could be extended to support new algebras. Operator overloading allows programmers to produce objects that (for example) implement different kinds of linear operators, while hiding the details of their implementation.

A second reason for adding objects to Matlab is to allow programmers to reuse names via function overloading to help manage the global namespace. In "classic," Matlab's flat namespace model, all functions were globally visible. Object-based typing allows Matlab programs to use functions with the same name to operate on different data types.

To a Matlab user, the first reason is far more important than the second. Matlab is fundamentally about algebraic operations; namespace management and the creation of new data types to enable the construction of large programs are less important to Matlab's users than expressing their mathematical formulas concisely and elegantly. By emphasizing function and operator overloading rather than namespace management, Matlab reflects the needs of numerical scientists, many of whom are not familiar, or concerned, with the sorts of software engineering issues that abstract data types are designed to address. (As an aside, a quick glance at any computer-science text will show that stacks are everyone's favorite abstract data type. The four fragments in Example 1 implement a stack in Matlab using a cell array.) To try Example 1, create the M-files in a directory called "@stack." Be sure that the directory containing the @stack directory is on the Matlab path. When prompted by Matlab (>>), you can then type:

s=stack

s=push(s,rand(4))

s=push(s,magic(3))

[s,x]=pop(s)

Then x will be set to a 3×3 magic square.

A new class is defined in Matlab in stages. The first is to specify a Matlab structure for storing the data defining instances of that class. Unlike compiled languages, but like Perl and Python, the structure is not defined statically; instead, the constructor for the class is responsible for creating a structure of an appropriate kind. Class objects are different from structures in that all objects of the same class must contain exactly the same fields; fields cannot be added to or removed from objects once they have been constructed.

The second step is to define one or more methods to operate on the class. Each method must be put in a separate file; all of these files must then be put in a directory with the same name as the class. Programmers accustomed to C or C++ might find this mandatory mapping of logical structure (classes and methods) to physical structure (directories and files) strange, but it is similar to Java's management of packages.

Every class must define one or more constructors; one of these must be a default constructor taking no arguments, so that arrays of class instances can be created. Programmers can also provide conversion operators (casts) and overloads for standard operators, including array subscripting, "()", and structure reference, ".". Unlike C++, but like Python, Matlab does not allow you to directly overload operators (there is no equivalent of C++'s operator+(), for example). Instead, Matlab internally maps particular operators (such as +) to particular functions (such as plus) before trying to execute them. Overloading an operator is therefore simply a matter of writing a function with the right name and number of arguments, and placing it in the class's method directory.

The precedence rules used to disambiguate operator overloading are key to Matlab's object-oriented system. Precedence is a partial ordering among types. By default, all types have the same precedence, and dispatch is to the leftmost object in the parameter list. However, a class can assert that it is superiorto or inferiorto another class. Then, if more than one class object appears in a parameter list, dispatch is to the leftmost object of highest precedence.

One of the biggest differences between Matlab and most other object-oriented languages is that dispatching in Matlab is dynamic rather than lexical. In a C++ expression a.foo(b,c), the method foo must be found in the class inheritance tree of the object a. You know that you don't have to look at the class trees of b or c to determine what function is going to be called. In Matlab, this is not the case. Given that a, b, and c are Matlab objects, the function call foo(a,b,c) might dispatch to a foo function in any of the classes of a, b, or c. You can't tell until run time, because of the inferiorto/superiorto hierarchy. This is very different from a virtual function, because with virtual functions, you at least know which class tree you'll have to search (really, just one lookup in the virtual function table).

As with most things in computing, this is easier to understand by example. Assume a program contains a function foo() and variables a, b, and c. Furthermore, assume that a and c are objects and that b is anything else (such as an N-dimensional array). When the program calls foo(a,b,c), Matlab has to determine which function to invoke. Since at least one argument is an object, Matlab checks the argument list to find the leftmost object of highest precedence (the leftmost position is used to resolve ties). Call this the target object.

  1. If the name of the called function (foo()) is the same as a Matlab built-in function, check the target class (and all its superclasses) for a function of the same name. If one is found, call it, otherwise issue an error.
  2. If the function name is the same as a class directory, Matlab checks for a user-defined conversion function in the target class. If there is none, it assumes this is a constructor call. For example, given classes bar and baz and an object x of class bar, then y=baz(x) could be either a call to @bar/baz.m (user-defined conversion from bar to baz, defined in bar class) or @baz/baz.m (constructor for class baz, taking a bar argument).
  3. Check the target class and all its superclasses for the method foo(). If found, call it.
  4. Check the MATLABPATH for a function foo(). If found, call it. The MATLABPATH can change depending on the current directory, and that functions in private directories appear first in the path.
  5. Generate an error.

Inheritance is handled by having a class's constructor invoke its parent class's constructor explicitly. If a class has multiple parents, it must invoke all of their constructors. While classes inherit methods from their parents, one important difference between Matlab and languages such as Java and C++ is that the methods defined by a child class cannot directly access the fields of that class's parents. This means that collision of data values (the "diamond DAG" problem with multiple inheritance) doesn't arise, but it also means that there is no way to implement the equivalent of C++'s virtual inheritance, in which only a single instance of the grandparent data is created. It also makes it harder (but not impossible) for a derived class to break any invariants obeyed by its parent(s).

What Was Left Out

Programming languages are as prone to "feature bloat" as everything else in computing. While adding many new features to Matlab, its developers considered, and rejected, several others. For example, Matlab doesn't allow users to define copy constructors for objects, because of concern over the performance implication of creating and destroying a large number of temporary variables in the course of running a typical Matlab program. (For a look at how to avoid taking this performance hit in C++ by using templates to create customized computational kernels on demand, see Todd Veldhuizen's Blitz++ library at http://monet.uwaterloo .ca/blitz/.)

Matlab 5 also still lacks pointers or any other form of dynamic reference. Flat dynamic structures, such as queues and stacks, must be implemented using Matlab's extensible arrays; implementing more convoluted structures, such as trees and directed graphs, is as much of a challenge as it has always been in Fortran. It is worth noting, however, that most Matlab users don't care about trees and directed graphs and therefore never notice their absence.

Conclusion

Top-down and bottom-up language designs can both easily go wrong, but for different reasons. Languages that grow from the bottom up usually serve their users well initially, but can easily bloat into a confusing tangle of unrelated capabilities and syntactic conveniences that get in each other's way. Perl is probably the best-known example of this: What was a simple tool for manipulating regular expressions and formatting reports has come to rival English spelling in its complexity. Matlab's flat namespace causes similar problems: Since all toolboxes share the same namespace, function names must be unique across all toolboxes. As they become more widely used, Matlab 5's object-oriented features will reduce, but not entirely eliminate, these restrictions: All classes will share the same namespace, but each class will create its own namespace for function definition.

On the other hand, a language designed from the top down, centered around one or a few key ideas, is more likely to be elegant, but less likely to be usable "off the shelf" for real-world problems. Scheme, for example, is one of the cleanest languages around, but programmers who try to build large applications using Scheme often find that what they're really doing is building an interpreter for a higher-level, domain-specific language. Matlab's emphasis on arrays is as single-minded as Scheme's emphasis on lists, but seems to require less "wrapping" to be useful in its target domain.

Like every other scripting language, Matlab began as a simple way to do powerful things, and it has become a not-so-simple way to do very powerful things. Though it has evolved in parallel with languages like Perl and Visual Basic, Matlab has, because of its mathematical domain, become very different from them. Matlab's variables are strongly, but dynamically, typed because that is how variables work in a mathematical equation. On the other hand, Matlab supports structured namespaces with computer science's object-oriented techniques because that makes writing large programs easier. Perl has object-oriented features too, but they do not, for example, permit operator priorities to be adjusted at run time. That's not a shortcoming of Perl, but rather a reflection of the needs of its users. And that's the point: Scripting languages such as Perl, Visual Basic, and Matlab are useful mostly because they are domain specific. They will never converge to a single language because their domains are not converging, but their successors will continue to borrow ideas from one another and evolve in similar ways.

For More Information

The MathWorks Inc.
24 Prime Park Way
Natick, MA 01760
508-647-7000
http://www.mathworks.com/

DDJ