Now that we have a way to build data structures from program text, it is time to begin to define the behavior of the program. Before getting into it, I want to clarify that this blog is for the purpose of documenting and tracking my development of this project, rather than formally introducing this language, or providing something that’s easy to follow along with. It’s going to be code + commentary, to provide motivation and organization for the project. In the future, I would like to fully describe the grammar and semantics of the language, but for the time being this is also not an academically rigorous endeavor.
We’ll start by creating some primitive types to handle runtime values.
Data Types
#[derive(derive_more::Display, Debug, PartialEq, Clone)]
pub enum DataType {
Atomic(AtomicType),
Array(ArrayType),
Unit
}
These three types will encapsulate all runtime values. Unit
has no data attached, and it is meant to be like Rust’s unit type, which you can think of a non-value or placeholder, similar to void
.
Atomic
is what you’d expect:
#[repr(C, u8)]
#[derive(derive_more::Display, Debug, Clone, PartialEq)]
pub enum AtomicType {
// Numeric
Int(isize),
Index(usize),
Float(f64),
Complex(num::Complex<f64>),
//Non-Numeric
Character(char),
Boolean(bool),
//Fn
Fn(Callable),
}
With the inclusion, of functions, because we need to be able to have functions a run-time values if we want to have them as the results of expressions, or the callee of another function (currying), or even units in an array of funtions :). Speaking of Arrays, since that is the goal of this language to support them, let’s see how to do that.
Arrays
// Row-major Array
#[derive(ambassador::Delegate, Clone, Hash, Debug, PartialEq, Eq)]
#[delegate(ArrayProps, target = "shape")]
#[delegate(ShapeOps, target = "shape")]
#[repr(C)]
pub struct Array<T> {
pub shape: Shape,
pub data: Vec<T>,
}
A shape and contiguous array, that’s all. In the future, we may need to choose a more opinionated data structure than the trusty and versatile Vec
, but for now it works. Also notice the use of the ambassador crate’s delegate
macro. It’s a very useful procedural macro, that allows us to have mess-free inheritance on struct fields and enum variants. Those two inherited traits looking like this:
pub type ShapeVec = SmallVec<[usize; 4]>; // Will stay on the stack for up to 4 dimensions.
#[derive(Clone, Hash, Debug, PartialEq, Eq)]
pub struct Shape {
s: ShapeVec,
}
#[ambassador::delegatable_trait]
pub trait ArrayProps {
fn shape_slice(&self) -> &[usize];
fn shape(&self) -> &Shape;
fn rank(&self) -> usize;
fn volume(&self) -> usize;
}
#[ambassador::delegatable_trait]
pub trait ShapeOps {
fn reshape(&mut self, newshape: &[usize]) -> Result<()>;
}
With that, it’s possible to create variants of the Array
type for each atomic value type that we have defined:
#[repr(C, u8)]
#[derive(derive_more::Display, ambassador::Delegate, Debug, Clone, PartialEq)]
#[delegate(ArrayProps)]
#[delegate(ShapeOps)]
pub enum ArrayType {
// Numbers
Int(Array<isize>),
Index(Array<usize>),
Float(Array<f64>),
Complex(Array<num::Complex<f64>>),
// Non-Numbers
Character(Array<char>),
Boolean(Array<bool>),
Fn(Array<Callable>),
}
These types you can think of as the ‘return values’ for expressions and functions. For our interpreter, I think that this will be more than sufficient. We could in fact simplify it further. This is an important step because in our language we like things well defined. Perhaps in other implementation languages than Rust, we could be using dynamic values, but this is much better, I think. We are again using the delegate
macro. This will be very useful when implementing operations such as Add
between two arrays. Notice how we have been using the Callable
type to represent functions. I will expand on this later. For now, let’s move on and look at evaluating expressions. If you are reading this and have not seen my previous post on this project, it will be necessary to understand the next section.
Interpreting
Expressions are interpreted by defining the behavior of a function which consumes the parsed expressions, and returns an “atomic” value1 . This is done by defining the Interpret
trait and implementing it for an Expr
:
type InterpretResult = Result<DataType>;
pub trait Interpret {
fn interpret(self, e: &mut DynamicEnv) -> InterpretResult;
}
This requires that we define the Env
type to track bound variables and scopes. I will spare you the details, it is a singly-linked stack2 of mappings3:
┌────────────┐ ┌────────────┐
│ Map │ │ Map │
├────────────┤ ├────────────┤
│ d: <data> │ │ a: <data> │
│ e: <data> │ ────► │ b: <data> │ ─ ─ ─►
│ f: <data> │ │ c: <data> │
│ ... │ │ ... │
└────────────┘ └────────────┘
Next, we will define the trait to evaluate expressions, which takes an Expr
and an environment, and returns a concrete data type value. I have written an explanation of expressions and parsing previously, so see that. The implementation of this trait will walk the syntax tree recursively.
pub type DynamicEnv = Env<Box<DataType>>;
type InterpretResult = Result<DataType>;
pub trait Interpret {
fn interpret(self, e: &mut DynamicEnv) -> InterpretResult;
}
impl Interpret for Expr {
fn interpret(self, e: &mut DynamicEnv) -> InterpretResult
{
match self {
Expr::Frame(node) => node.interpret(e),
Expr::AtomLiteral(node) => node.interpret(e),
Expr::Term(node) => node.interpret(e),
Expr::FnCall(node) => node.interpret(e),
Expr::FnDef(node) => node.interpret(e),
Expr::Let(node) => node.interpret(e),
Expr::ArrayLiteral(_) => todo!(),
}
}
}
Oh, and there’s also the “statement”. Since we are building a functional programming language, there is really only one kind of statement, and it’s the loyal print statement. The Unit
statement is what is returned parsed when there’s an empty line, all else is an expression that evaluates to a non-trivial value.
impl Interpret for Stmt {
fn interpret(self, e: &mut DynamicEnv) -> InterpretResult
{
match self {
Stmt::Expr(node) => node.expression.interpret(e),
Stmt::Print(node) => {
let r = node.expression.interpret(e)?;
// also print type
println!("\n {}", r);
Ok(r)
}
Stmt::Unit(_) => Ok(DataType::Unit),
}
}
}
Each type of expression has different semantics. The simplest to interpret is literal values, and identifiers.
impl Interpret for AtomLiteralNode {
fn interpret(self, _e: &mut DynamicEnv) -> InterpretResult
{
Ok(DataType::from(&self.value))
}
}
impl Interpret for TermNode {
fn interpret(self, e: &mut DynamicEnv ) -> InterpretResult
{
Ok(*e.get(&self.name)?.clone())
}
}
Recall that the DynamicEnv
stores values of type Box<DataType>
, so the stored value is not copied, just the pointer, and then the value can be un-boxed, and that’s how we are able to conform to the borrow rules in this tenuous context. To assign values in the environment, evaluate the let expression:
impl Interpret for LetNode {
fn interpret(self, e: &mut DynamicEnv) -> InterpretResult
{
let init_val = self.initializer.interpret(e)?;
e.put(self.name.clone(), Box::new(init_val));
if let Some(body) = self.region {
body.interpret(e)
} else {
Ok(DataType::Unit)
}
}
}
The conditional here is included because we want to be able to assign values in the REPL without providing the body of the code immediately, so these statements ending with a semicolon i.e. “let a = 2;” work as well as let statements with body as in “let a = 2 in a * a”. When parsing a longer program, the “in” is equivalent to the semicolon and all of the following statements encapsulated in that scope will have access to the variable. for example,
(
let a = 3 in
let b = 5 in (a+b)/2
) + 1
``
the second let evaluates to `(3+5)/2 = 4`, so the first let evaluates to 4 as well, and the entire thing evaluates to 5. Basic functional programming stuff. This is a feature in Rust that you can observe in the code above, and that's how it becomes a feature in our interpreter. Also, we have access to grouping with parentheses, because the semantic structure is captured by the parser.
The only remaining thing that we need to have a fully functional calculator program is to implement the basic primitive operations. These are implemented as special forms of the function call expression, where the `Callee` is parsed infix of the arguments.
```rust
impl Interpret for FnCallNode {
fn interpret(self, e: &mut DynamicEnv ) -> InterpretResult
{
use libludi::types::PrimitiveFuncType;
match self.callee {
Callee::Expression(expr) => {
todo!()
}
Callee::Primitive(primitive_fn) => match primitive_fn {
PrimitiveFuncType::Add => {
assert_eq!(self.args.len(), 2);
let mut args = self.args.into_iter();
let (arg1, arg2) = (args.next().unwrap(), args.next().unwrap());
Add::add(arg1.interpret(e)?, arg2.interpret(e)?)
}
_ => todo!()
}
}
}
}
The best way to these that is to use a separate set of traits for the operations. That way we can use ambassador
to Delegate
the ops element-wise over the array. Each of the binary functions looks exactly the same as Add
, and the Add trait is essentially a re-definition of rust’s std::ops::Add
for each of the atomic data types. For the comparator, it’s slightly different – check whether the arguments are ‘float’ or not, because we can’t check equality on the float type.
Now we can write things like this, as just one example:
> let a = 1+1;
> print a > 2-1
true
For other function calls, we need to use a different to call other functions. We will create the Callable
trait which really just takes the arguments and provides them to the function.
Callee::Expression(expr) => {
let callee_val = expr.interpret(e)?;
let arguments = self
.args
.into_iter()
.map(|a| a.interpret(e))
.collect::<Result<Vec<DataType>>>()?;
match callee_val {
DataType::Atomic(AtomicType::Fn(function)) => {
Callable::call(function, arguments, e)
}
_ => Err(anyhow::anyhow!("Error: expression is not a function")),
}
}
Where Callable
is just a simple trait that does what it sounds like. If we wanted to, we could add several types of objects that can be called, for instance more primitive types that can be loaded as needed by entering different contexts. Most importantly for now, we add the function object type that will store function objects defined in program code, as hinted earlier.
#[derive(PartialEq, Clone)]
pub struct FunctionData {
pub data: FnDefNode,
}
impl FunctionData {
fn params(&self) -> &[Arg] {
&self.data.signature.args
}
fn arity(&self) -> usize {
self.data.signature.args.len()
}
}
pub trait Callable {
fn call(self, arguments: Vec<DataType>, e: &mut DynamicEnv) -> Result<DataType>;
}
impl Callable for FunctionData {
fn call(self, arguments: Vec<DataType>, e: &mut DynamicEnv) -> Result<DataType> {
if arguments.len() != self.arity() {
return Err(anyhow::anyhow!(
"Error: '{}' expected {} arguments, got {}",
"fn_name",
self.arity(),
arguments.len()
));
}
e.push_with(
zip(self.params().into_iter(), arguments.into_iter()).map(|(Arg(name, ty), value)| {
(name.clone(), value.into())
}),
);
let result = self.data.body.interpret(e);
e.pop();
result
}
}
impl Callable for DataType {
fn call(self, arguments: Vec<DataType>, e: &mut DynamicEnv) -> Result<DataType> {
match self {
Self::Atomic(AtomicType::Fn(data)) => data.call(arguments, e),
Self::Array(ArrayType::Fn(_data)) => todo!(),
_ => Err(anyhow::anyhow!(
"Error: this type cannot be called as a function!"
)),
}
}
}
All we need to use these features is to construct our function objects, by wrapping the tree in a new struct. Remember how the fn foo(args) {...}
syntax is just sugar for a let statement combined with an anonymous function. In our simple language, we do not need to have any complex rules about the difference between an anonymous function and a ‘regular’ one.
impl Interpret for FnDefNode {
fn interpret(self, _e: &mut DynamicEnv) -> InterpretResult {
Ok(DataType::Atomic(AtomicType::Fn(FunctionData {
data: self.into(),
})))
}
}
Testing it out:
>> fn inc(x) {x+1}
>> print inc(100)
101
>> let also_this = |x| {x+1}
>> print also_this(101)
102
There it is, it’s just the syntax tree of the function itself getting evaluated, and we are opening up a new scope before interpreting the inner function, and closing it after. There’s an issue with this though – observe:
>> fn foo(a) {a+b}
>> fn bar(b) {foo(1)}
>> print bar(2)
3
I’m leaving it at this for now. Will come back to fix this issue soon.
We just made the AtomicType
, however, in the formal sense, an array is also an ’atom, but it is also multiple. Note to self: take care not get confused by this distinction. In fact, this seems to get at something interesting about the semantics of array languages.
This is actually somewhat tricky, because of Rust’s memory model see this guide