Function (computer programming)

In computer programming, a function is a callable unit of software logic that has a well-formed interface and behavior and can be invoked multiple times.
Callable units provide a powerful programming tool. The primary purpose is to allow for the decomposition of a large and/or complicated problem into chunks that have relatively low cognitive load and to assign the chunks meaningful names. Judicious application can reduce the cost of developing and maintaining software, while increasing its quality and reliability.
Callable units are present at multiple levels of abstraction in the programming environment. For example, a programmer may write a function in source code that is compiled to machine code that implements similar semantics. There is a callable unit in the source code and an associated one in the machine code, but they are different kinds of callable units with different implications and features.

Terminology

Some programming languages, such as COBOL and BASIC, make a distinction between functions that return a value and those that do not ; some, such as C, C++, and Rust, only use the term "function" irrespective of whether they return a value or not; others, such as ALGOL 60 and PL/I, only use the word procedure. Some object-oriented languages, such as Java and C#, refer to functions inside classes as "methods".

History

The idea of a callable unit was initially conceived by John Mauchly and Kathleen Antonelli during their work on ENIAC and recorded in a January 1947 Harvard symposium on "Preparation of Problems for EDVAC-type Machines." Maurice Wilkes, David Wheeler, and Stanley Gill are generally credited with the formal invention of this concept, which they termed a closed sub-routine, contrasted with an open subroutine or macro. However, Alan Turing had discussed subroutines in a paper of 1945 on design proposals for the NPL ACE, going so far as to invent the concept of a return address stack.
The idea of a subroutine was worked out after computing machines had already existed for some time. The arithmetic and conditional jump instructions were planned ahead of time and have changed relatively little, but the special instructions used for procedure calls have changed greatly over the years. The earliest computers, such as the Manchester Baby, and some early microprocessors, such as the RCA 1802, did not have a single subroutine call instruction. Subroutines could be implemented, but they required programmers to use the call sequence—a series of instructions—at each call site.
A single subroutine was implemented in 1945 in Konrad Zuse's Z4 in the form of a tape.
In 1945, Alan Turing used the terms "bury" and "unbury" as a means of calling and returning from subroutines.
In January 1947 John Mauchly presented general notes at A Symposium of Large Scale Digital Calculating Machinery
under the joint sponsorship of Harvard University and the Bureau of Ordnance, United States Navy. Here he discusses serial and parallel operation suggesting:
Kay McNulty had worked closely with John Mauchly on the ENIAC team and developed an idea for subroutines for the ENIAC computer she was programming during World War II. She and the other ENIAC programmers used the subroutines to help calculate missile trajectories.
Goldstine and von Neumann wrote a paper dated 16 August 1948 discussing the use of subroutines.
Some very early computers and microprocessors, such as the IBM 1620, the Intel 4004 and Intel 8008, and PIC microcontrollers, have a single-instruction subroutine call that uses a dedicated hardware stack to store return addresses; such hardware supports only a few levels of subroutine nesting, but can support recursive subroutines. Machines before the mid-1960s, such as the UNIVAC I, the PDP-1, and the IBM 1130, typically use a calling convention which saved the instruction counter in the first memory location of the called subroutine. This allows arbitrarily deep levels of subroutine nesting but does not support recursive subroutines. The IBM System/360 had a subroutine call instruction that placed the saved instruction counter value into a general-purpose register; with additional code, this can be used to support arbitrarily deep subroutine nesting and recursive subroutines. The Burroughs B5000 is one of the first computers to store subroutine return data on a stack.
The DEC PDP-6 is one of the first accumulator-based machines to have a subroutine call instruction that saved the return address in a stack addressed by an accumulator or index register. The later PDP-10, PDP-11 and VAX-11 lines followed suit; this feature also supports both arbitrarily deep subroutine nesting and recursive subroutines.

Language support

In the very early assemblers, subroutine support was limited. Subroutines were not explicitly separated from each other or from the main program, and indeed the source code of a subroutine could be interspersed with that of other subprograms. Some assemblers would offer predefined macros to generate the call and return sequences. By the 1960s, assemblers usually had much more sophisticated support for both inline and separately assembled subroutines that could be linked together.
One of the first programming languages to support user-written subroutines and functions was FORTRAN II. The IBM FORTRAN II compiler was released in 1958. ALGOL 58 and other early programming languages also supported procedural programming.

Libraries

Even with this cumbersome approach, subroutines proved very useful. They allowed the use of the same code in many different programs. Memory was a very scarce resource on early computers, and subroutines allowed significant savings in the size of programs.
Many early computers loaded the program instructions into memory from a punched paper tape. Each subroutine could then be provided by a separate piece of tape, loaded or spliced before or after the main program ; and the same subroutine tape could then be used by many different programs. A similar approach was used in computers that loaded program instructions from punched cards. The name subroutine library originally meant a library, in the literal sense, which kept indexed collections of tapes or decks of cards for collective use.

Return by indirect jump

To remove the need for self-modifying code, computer designers eventually provided an indirect jump instruction, whose operand, instead of being the return address itself, was the location of a variable or processor register containing the return address.
On those computers, instead of modifying the function's return jump, the calling program would store the return address in a variable so that when the function completed, it would execute an indirect jump that would direct execution to the location given by the predefined variable.

Jump to subroutine

Another advance was the jump to subroutine instruction, which combined the saving of the return address with the calling jump, thereby minimizing overhead significantly. At least five forms of this existed in the 1950s.

Return jump: store the return address in the first word of the subroutine. Return Jump on the UNIVAC 1103A is an early example.
Index jump: store the return address in an index register. Transfer and Set indeX on the IBM 704 is an early example.
Automatic storage into a dedicated storage location: the Store P location on the RCA 501 is an early example.
Automatic storage into a designated register: The sequence and cosequence history registers on the Honeywell 800 is an early example.
Patch return address: The "return address" instruction adds one to the contents of the program counter register and patches the address portion of a branch instruction at the last word of the subroutine with the caller's return address. The "return address" instruction is followed immediately with a branch instruction to the beginning of the subroutine. The LGP-30 is an example.

The IBM System/360 architecture is an example of saving the return address into a designated register. The branch instructions BAL or BALR, designed for procedure calling, would save the return address in a processor register specified in the instruction, by convention register 14. To return, the subroutine had only to execute an indirect branch instruction through that register. If the subroutine needed that register for some other purpose, it would save the register's contents to a private memory location or a register stack.
The HP 2100 architecture is an example of saving the return address in the first word of the subroutine. The JSB instruction would save the return address in the memory location that was the target of the branch. Execution of the procedure would begin at the next memory location. In the HP 2100 assembly language, one would write, for example

...
JSB MYSUB
BB ...

to call a subroutine called MYSUB from the main program. The subroutine would be coded as

MYSUB NOP
AA ...
...
JMP MYSUB,I

The JSB instruction placed the address of the NEXT instruction into the location specified as its operand, and then branched to the NEXT location after that. The subroutine could then return to the main program by executing the indirect jump JMP MYSUB, I which branched to the location stored at location MYSUB. Compilers for Fortran and other languages could easily make use of these instructions when available. This approach supported multiple levels of calls; however, since the return address, parameters, and return values of a subroutine were assigned fixed memory locations, it did not allow for recursive calls.
Incidentally, a similar method was used by Lotus 1-2-3, in the early 1980s, to discover the recalculation dependencies in a spreadsheet. Namely, a location was reserved in each cell to store the return address. Since circular references are not allowed for natural recalculation order, this allows a tree walk without reserving space for a stack in memory, which was very limited on small computers such as the IBM PC.