3 min read

You will probably not be eaten by a grue

Michael Droste tweeted

Should I use Stata, R, Matlab, Julia, etc etc for my research? What #econtwitter WON’T tell you is that all of these share a fatal flaw: you can’t play Oregon Trail on them… … At least, until now! Now you can play Oregon Trail (1978) in Stata.

That’s clearly a challenge that any sensible R programmer would ignore. However, it does fit into an idea I’ve been thinking about for a while. If you have a script in SAS or SPSS that reads in and formats a complicated data set, how can you use that in R? One answer is to translate the script to R, but that’s a lot of work – every time. A better answer would be to write either a translator or an interpreter that runs the script.

So, in the Oregon Trail case, rather than following the Python version and Stata version in reimplementing the code, I decided to write an interpreter for a small subset of BASIC and use it to run the original code.

library(OregonTrail)
play()

I have not tested all the BASIC code paths, but I think I have tested all the R code paths, so it’s possible that it works.

Interpreting BASIC

The necessary interpreter is fairly small – it’s almost a finite-state machine; you could think of it as a finite-state machine with an oracle for arithmetic expressions. It’s here. A particular advantage of this approach is that the 1970s use of GOTO in BASIC is trivial to interpret, but a bit of a pain to either translate to R code or understand and re-implement.

The main tricky part was that BASIC uses parentheses both for array access and for function calls, and uses = to test for equality. In order to use R’s eval as the oracle for arithmetic, I had to rewrite BASIC arithmetic expressions into R syntax. This is the same sort of task as rewriting R arithmetic syntax into SQL for my sqlsurvey package and RStudio’s dbplyr.

The basic evaluation loop is this function

step<-function(state){
  thisline<-basic[state$counter]
  thiscmd<-command[state$counter]
  state<-switch(thiscmd,
    GOTO=GOTO(thisline, state),
    GOSUB=GOSUB(thisline, state),
    RETURN=RETURN(thisline, state),
    READ=READ(thisline, state),
    DATA=basic_DATA(thisline, state),
    RESTORE=RESTORE(thisline, state),
    PRINT=basic_PRINT(thisline, state),
    INPUT=INPUT(thisline,state),
    LET=LET(thisline,state),
    IF =basic_IF(thisline, state),
    ON = ON(thisline,state),
    QUIT={state$finished<-TRUE; state},
    END={state$finished<-TRUE; state},
    REM = advance(state),
    DIM = basic_DIM(thisline,state),
    assignment(thisline,state)
  )
        
  state
}

where advance is the function that goes to the next line.

Now you have two problems

There are a lot of regular expressions. Here, working out the target of GOTO

GOTO<-function(thisline, state){
  tolineno<-as.numeric(gsub("^.*GOTO[[:blank:]]([0-9]+)","\\1",thisline)[1])
  codeline<-which(lineno==tolineno)
  if(length(codeline)!=1) 
      stop(paste("GOTO considered harmful:",thisline))
  state$counter<-codeline
  state
}

and much more messily, to work out the target and value of an assignment

assignment<-function(thisline, state){
  expr<-gsub("^[0-9]* ","",thisline)
        
  target<-trimws(gsub("(.+)=.+","\\1",expr))
  target<-gsub("$","_",target,fixed=TRUE)
        
  value_expr<-trimws(gsub(".+=(.+)","\\1",expr))
  value_expr<-    gsub("$","_",value_expr,fixed=TRUE)

  value_expr<-rewrite(parse(text=value_expr)[[1]])
  target_expr<-rewrite(parse(text=target)[[1]])

  state$variables<-eval(bquote(within(state$variables, 
      .(target_expr)<-.(value_expr))))
        
  if(state$debug) print(state$variables)
  advance(state)  
}

Parsing arithmetic

The only part of this that isn’t a finite-state machine is the evaluation of logical and arithmetic expressions, which needs a stack, both for the rewriting and for R’s eval to evaluate the code (as seen in the code for assignment above)

rewrite<-function(expr){
  if (length(expr)==2){
    if (expr[[1]] == as.name("TAB") ||
      expr[[1]] == as.name("RND") ||
      expr[[1]] == as.name("CLK")||
      expr[[1]] == as.name("INT") ||
      expr[[1]] == as.name("(") ||
      expr[[1]] == as.name("-"))
    {
      expr[[2]]<-rewrite(expr[[2]])
      return(expr)
    } else{
      expr[[2]]<-rewrite(expr[[2]])
      expr<-bquote(.(expr[[1]])[.(expr[[2]])])
      return(expr)
    }
  }

  if (length(expr)==1){
    if (expr==as.name("=")) expr<-as.name("==")
    return(expr)
  }
    
    ## length=3
  expr[[1]]<-rewrite(expr[[1]])
  expr[[2]]<-rewrite(expr[[2]])
  expr[[3]]<-rewrite(expr[[3]])
  return(expr)
}