From build-time to run-time
We turn now to the life-cycle of an EM program – covering its build-time transformation from a single "main" source module into a binary program image comprising multiple modules, as well as its run-time execution phases from hardware reset to system shutdown.
Along the way, you'll understand the role played by EM language intrinsics throughout the program life-cycle – not only to demarcate execution phases at program run-time (em$run
), but also to enable active participation by content suppliers at various stages during program build-time (em$configure
and em$generateUnit
).
High-level build flow
The following figure depicts the four principal phases of the EM program life-cycle, as well as maps out the high-level flow of build-time artifacts – starting with a ModP.em
source file and ending with a main.out
binary image. The first three of these phases unfold on your host computer, and collectively constitute program build-time; the final phase, needless to say, represents run-time execution of the generated program on target hardware.
Unit translation
Each .em
source file – whether a module
, interface
, composite
, or template
– represents an independent unit of translation within EM. Starting from a designated top-level unit – in our case, a module named ModP
which would implement the em$run
intrinsic – EM will (recursively) process an N-element hierarchy of other translation units that ModP
directly or indirectly imports; since the relation defined by import
directives cannot have cycles, translating ModP
effectively yields a top-to-bottom (partial) ordering of its dependent units.
Translating concrete modules such as ModP
will generally produce three corresponding output files, consumed in subsequent phases of the program build process:
ModP.hpp |
the public / private features of ModP translated into a C++ header file |
ModP.cpp |
internal function definitions within ModP translated into equivalent C++ code |
ModP.js |
a JavaScript rendition of ModP which will contribute during program configuration |
Translating abstract interfaces such as our earlier ModI
example will only yield a ModI.js
and ModI.hpp
output file. Translating composites or templates such as our earlier CompC
or GenT
examples – which contribute at build-time but not run-time – will only yield a CompC.js
or GenT.js
output file.
Finally, all template instantiations encountered en route through import
directives (such as the references to GenT
in CompC
) will trigger immediate execution of the designated template's em$generateUnit
intrinsic – already translated to JavaScript within the GenT.js
output file. Unit translation of the new .em
file produced at this step then proceeds recursively.
Translator efficiency
A top-level module such as ModP
could easily have static dependencies on more than 100 other translation units – especially when imported composites aggressively instantiate templates managing discrete MCU resources like GPIO pins. To accelerate program build-time, the EM translator maintains an internal cache of all generated files and will only (re-)translate a particular .em
file when deemed necessary.
Program configuration
The configuration phase of the EM program life-cycle – still upstream from the final compilation of all generated C++ code into a binary image – actually entails executing a special hosted version of the program rendered in JavaScript. Labeled main.js
in the earlier figure, this fabricated program basically amalgamates the .js
files output for each module or composite found within the N-element import
hierarchy rooted at ModP
itself.
But why JavaScript ???
Seemingly, any hosted language (Java, Python, Ruby) could provide a suitable execution environment for this phase of the EM program life-cycle. Some might argue the case for Python, as this language already plays a similar role with respect to C/C++ code – especially in emerging platforms such as TinyML, which deploy machine-learning algorithms (developed in a hosted Python environment) onto embedded target hardware.
As it turns out, JavaScript had already claimed the host language role among EM's predecessors – notably the Eclipse / RTSC project which in turn drew upon earlier DSP/BIOS configuration technology. Given the Java-centricity of the Eclipse IDE, Mozilla's Rhino – a JavaScript engine written in Java and seemlessly integrated with the JVM runtime – served as an ideal environment at that point in time.
Indeed, an Eclipse plug-in (written in Java) provided almost a decade of IDE support for the EM language; and Rhino therefore remained our JavaScript platform of choice. But now that language support for EM has migrated to the VS Code IDE – written in TypeScript and running on the Chromium / V8 engine – Node.js provides an even richer JavaScript platform for hosting the configuration phase of the EM program life-cycle.
During its execution, the prog.js
program makes three top-to-bottom passes over the N-element import
hierarchy rooted in ModP
– invoking JavaScript translations of certain EM intrinsics on a per-unit basis (if defined).
The 1st pass invokes em$preconfigure
, which only composites may elect to define; public proxies and config parameters bound at this time using the single-assignment operator [ ?=
] become immune to further modification in the next pass.
The 2nd pass invokes em$configure
, which modules as well as composites may elect to define; proxies and configs bound here using the [ ?=
] operator become immune to further modification by lower-level units yet to execute in this pass.
The 3rd pass invokes two intrinsics on modules whose special em$used
config parameter tests true: em$construct
, for initializing private module state; and em$generateCode
, for synthesizing internal C/C++ code using the EM template mechanism illustrated in GenT
.
The [ ?=
] operator, as hinted earlier, implements single-assignment semantics – sealing the first value assigned to a configurable proxy or parameter, while silently ignoring all subsequent assignments to the same feature. With a top-to-bottom ordering imposed on the ModP
import
hierarchy, [ ?=
] operations executed by higher-level modules and composites essentially "override" (default) binding decisions made by lower-level units. By implementing em$configure
, ModP
itself can now preempt proxy / parameter assignments otherwise made by any modules or composites it may import.
More on pre-configuration
Higher-level modules such as ModP
cannot, however, effect the values of configurable features already bound in the first configuration pass via em$preconfigure
; the latter intrinsic enables suppliers of EM composites to selectively freeze proxy bindings and parameter values, tempering flexibility in the interest of robustly assembling elements for a fixed application setting. As an example, our McuC composite binds physical pin numbers read from a board-specific YAML file – reflecting the "hard reality" of the underlying hardware.
Referring to the earlier figure, one practical consequence of configuration becomes pruning the original (and often large) N–element import
hierarchy into a more tractable M–element subset comprising those modules actually used within the program. In support, each module has an intrinsic em$used
parameter – automatically bound in most cases, but explicitly configurable if necessary – that ultimately determines membership in the M–element subset.
The top-level module ModP
has its em$used
parameter automatically set, and is always used within the program.
If module Mod1
is used and Mod1
imports module Mod2
(directly or via a composite), then Mod2
is used as well.
If module Mod1
is used and proxy Mod1.ModX
ultimately delegates to module Mod2
, then Mod2
is used as well.
Otherwise Mod1
is not used in the program, unless some higher-level module or composite explicitly sets Mod1.em$used
.
The final configuration pass gives each used module within the M–element subset an opportunity to focus internally; configuration of all public features of these modules would have already occurred. By defining the em$construct
intrinsic, modules may programmatically initialize their private var
, config
, or even proxy
features at this point within the flow.
But since em$construct
actually executes on your host computer, module suppliers can now implement complex initialization algorithms at build-time that would otherwise prove far too costly to execute at run-time on resource-constrained MCUs.
With language constructs normally used to implement target-side functions like em$run
also available in hosted functions like em$construct
, module suppliers can now migrate (expensive) computations from run-time to build-time with little effort. Said another way, EM can serve as its own meta-language – synthesizing the final form of a concrete module by statically reflecting upon values assigned to its configurable parameters.
Examples of EM meta-programming – data initialization
The em$construct
function of ti.mcu.cc23xx/ConsoleUart0 computes values for private configs ibrd
and fbrd
, eventually used to initialize hardware registers defining the UART's baud-rate; a less efficient implementation would perform this computation at run-time. Taking this approach to the next level, the em$construct
function of em.utils/FftC32 initializes a custom sine-wave table at build-time.
Besides executing (costly) math functions, EM meta-programming can also initialize complex, linked data-structures at build-time – such as the em$construct
function and createH
functions of em.utils/FiberMgr, which in turn call build-time functions of em.utils/ListMgr. As a general rule, any static initialization of data at build-time results in more compact programs at run-time.
Complementing em$construct
– oriented towards initializing private state – some modules will also implement the em$generateCode
intrinsic. Using the same form of templatized output statements illustrated earlier in GenT
, module suppliers can inject customized C/C++ code fragments into the final program image – with public config parameters typically shaping the synthesized output.
Examples of EM meta-programming – code generation
On the low end of the scale, MCU-specific modules like ti.mcu.cc23xx/Regs use the em$generateCode
intrinsic to #include
vendor-supplied header files; modules such as Rtc will then reference symbols and macros defined in these headers using a special ^^
escape token.
Moving up a notch, the ti.mcu.cc23xx/IntrVec module programmatically synthesizes the run-time vector table for this MCU using build-time bindings of interrupt handlers – complete with compiler-specific directives to control placement in memory. In the limit, em$generateCode
can leverage the full capabilities of JavaScript executing on your host computer.
Program compilation
Referring back to the earlier figure, the ultimate outcome of executing the main.js
(meta-) program within the overall EM build-flow becomes yet another program – this time, a single C++ program labeled main.cpp
. As suggested earlier, this program only incorporates generated code from the M used modules selected from the original set of N imported units traced back to ModP.em
.
Each module Mod
participating in this consolidated C++ program respectively contributes (in order) the following portions of code, which collectively represents the bulk of the generated main.cpp
file's content:
constant, type, variable, and function declarations from Mod.hpp
, generated during the initial translation of Mod.em
;
static data initializers reflecting the values assigned to public / private features of Mod
during the prior configuration phase;
any C/C++ code synthesized by the Mod.em$generateCode
intrinsic, executed during the prior configuration phase; and
definitions of declared and intrinsic functions from Mod.cpp
, generated during the initial translation of Mod.em
.
By merging all generated C/C++ code into a single input file, the underlying compiler for the target MCU can aggressively optimize the program as a whole – folding away constants, inlining small functions, and eliminating unused code or data. As a case in point, client function calls via abstract proxies to configured delegate modules – seemingly a double-indirection at run-time – will usually "melt-away" and leave the delegate function body inlined at the client call-site.
Example of whole-program optimization
Returning to FftC32, its exec
function uses three config
parameters at run-time which em$construct
previously initialized by at build-time – N_WAVE
, N_WAVE_LOG2
, SINE_WAVE
. Knowing the values of these parameters when digesting main.cpp
, the compiler has greater latitude in making time / space tradeoffs when generating object code for FftC32.exec
.
As another example, em.utils/FiberMgr makes many function calls via Common.GlobalInterrupts
– a proxy which conforms to the GlobalInterruptsI interface, and ultimately delegates to a hardware-specific implementation such as ti.cc23xx.mcu/GlobalInterrupts. Knowing this particular proxy - delegate binding, the compiler would inline the delegate's (small) functions directly at each Common.GlobalInterrupts
call site.
Program execution
This final phase of the EM program life-cycle – which represents the transition from build-time to run-time – technically commences when you load the executable main.out
image into target memory and reset the MCU. But as you'll see, run-time contributions from the M concrete modules used within this program won't occur until execution reaches main
.
The path to main
The path actually taken from loading the main.out
file to executing the C/C++ main
function can vary widely from one target environment [MCU + compiler + board] to the next; but fortunately, each distribution of the EM software platform will render this process transparent to the application developer. In practice, each EM distro will leverage much of the tooling infrastructure supporting the underlying MCU – from flash loaders that operate on standard .bin
or .hex
files, to compiler startup files like crt0.s
that manage the transition from MCU reset to C/C++ main
as efficiently as possible.
For the M concrete modules bound within the main.out
image, program run-time actually begins when target execution reaches the C/C++ main
function. Since the underlying compiler's own startup file does little more than prepare data memory and initialize critical CPU registers, more comprehensive startup of the target board and the MCU peripherals still needs to occur prior to calling the top-level ModP.em$run
intrinsic.
The main
function initially calls a C++ rendition of Modr.em$reset
, where Modr
represents the first module to implement this intrinsic found by a top-to-bottom scan of the M modules used in this program; needless to say, this scan occurs at program build-time, not run-time. In practice, some target-specific module included with your EM distribution will assume responsibility for defining the em$reset
intrinsic; higher-level application modules generally avoid (re-)defining this intrinsic.
The main
function will next call C++ renditions of Modi.em$startup
for each Modi
found to implement this intrinsic; here too, a top-to-bottom scan of all M program modules occurs at build-time. Unlike em$reset
, higher-level application modules down to target-specific driver modules will define this intrinsic in order to perform (run-time) initializations not possible during (build-time) execution of em$construct
.
The main
function then calls a C++ rendition of Mods.em$startupDone
, where Mods
represents the first module found to implement this intrinsic through a top-to-bottom scan of all M modules participating in the program. As with em$reset
, your EM distribution will usually take responsibility for defining em$startupDone
– which performs any final hardware setup before the application program assumes control.
The main
function finally calls a C++ rendition of ModP.em$run
, which effectively transfers control to the top-level module of this application. Since embedded applications often execute some form of "run forever" loop – whether explicitly coded within the program or else implicitly managed by some run-time task scheduler – in practice the em$run
intrinsic will not return control back to the calling main
function.
Examples of em$startup
Many modules that manage MCU hardware peripherals will define em$startup
– such as Idle and Rtc found in the ti.mcu.cc23xx
package; clearly, this sort of hardware setup must occur at run-time. By extension, portable modules like em.utils/SoftUart which leverage proxies to interact with underlying hardware may likewise rely upon em$startup
to perform some run-time initialization – in this case, initializing a GpioI proxy named TxPin
.
Should the top-level ModP.em$run
intrinsic actually return to main
, control then transfers to a distinguished __halt
function – also generated prior to program compilation – which supervises an orderly shutdown of the target application. If necessary, any program module can explicitly initiate the shutdown sequence at run-time through a special halt
statement that can appear inside EM function definitions.
The __halt
function will first call C++ renditions of Modi.em$shutdown
for each Modi
found to implement this intrinsic; higher-level applications down to target-specific drivers modules will define this intrinsic in order to perform run-time finalization prior to halting the processor.
The __halt
function then calls a C++ rendition of Modh.em$halt
, where Modh
represents the first module to implement this intrinsic found by a top-to-bottom scan; each EM distro offers a "default" version of em$halt
, though higher-level modules may (re-)define this intrinsic in some cases.
Should the implementation of em$halt
happen to return to __halt
, program control would fall-through to a special block of code that simply spins within an infinite loop.
In cases where something goes "seriously wrong" within the system and execution should terminate more abruptly, EM also supports a special fail
statement that can appear in any function definition. When executed at run-time, fail
immediately transfers control to an implementation of the em$fail
intrinsic – often found within the same module implementing em$halt
; should em$fail
return, the program likewise enters an infinite loop.
More on startup/shutdown intrinsics
By design, each EM distro relies upon the portable em.utils/BoardController module which centralizes definitions of the singleton intrinsics em$reset
, em$startupDone
, em$halt
, and em$fail
; configuration of the BoardController
module and its dependents typically occurs within the distro's BoardC composite. In those (rare) circumstances where some higher-level module needs to "override" one of these special functions, the higher-level intrinsic definition would likely call the corresponding "base" function within BoardController
.
While we've already directed you to browse selected .em
source files drawn from EM platform runtime, Chapter 4 concludes our technical overview with more comprehensive picture of this environment.