
Build Your Own Programming Language
The need for different types of computer languages is growing rapidly and developers prefer creating domain-specific languages for solving specific application domain problems. Building your own programming language has its advantages. It can be your antidote to the ever-increasing size and complexity of software.
In this book, you'll start with implementing the frontend of a compiler for your language, including a lexical analyzer and parser. The book covers a series of traversals of syntax trees, culminating with code generation for a bytecode virtual machine. Moving ahead, you'll learn how domain-specific language features are often best represented by operators and functions that are built into the language, rather than library functions. We'll conclude with how to implement garbage collection, including reference counting and mark-and-sweep garbage collection. Throughout the book, Dr. Jeffery weaves in his experience of building the Unicon programming language to give better context to the concepts where relevant examples are provided in both Unicon and Java so that you can follow the code of your choice of either a very high-level language with advanced features, or a mainstream language.
By the end of this book, you'll be able to build and deploy your own domain-specific languages, capable of compiling and running programs.
- Build Your Own Programming Language
- Contributors
- About the author
- About the reviewers
- Preface
- Who this book is for
- What this book covers
- To get the most out of this book
- Download the example code files
- Code in Action
- Download the color images
- Conventions used
- Get in touch
- Share Your Thoughts
- Section 1: Programming Language Frontends
- Chapter 1: Why Build Another Programming Language?
- So, you want to write your own programming language
- Types of programming language implementations
- Organizing a bytecode language implementation
- Languages used in the examples
- Language versus library whats the difference?
- Applicability to other software engineering tasks
- Establishing the requirements for your language
- Case study requirements that inspired the Unicon language
- Unicon requirement #1 preserve what people love about Icon
- Unicon requirement #2 support large-scale programs working on big data
- Unicon requirement #3 high-level input/output for modern applications
- Unicon requirement #4 provide universally implementable system interfaces
- Summary
- Questions
- So, you want to write your own programming language
- Chapter 2: Programming Language Design
- Determining the kinds of words and punctuation to provide in your language
- Specifying the control flow
- Deciding on what kinds of data to support
- Atomic types
- Composite types
- Domain-specific types
- Overall program structure
- Completing the Jzero language definition
- Case study designing graphics facilities in Unicon
- Language support for 2D graphics
- Adding support for 3D graphics
- Summary
- Questions
- Chapter 3: Scanning Source Code
- Technical requirements
- Lexemes, lexical categories, and tokens
- Regular expressions
- Regular expression rules
- Regular expression examples
- Using UFlex and JFlex
- Header section
- Regular expressions section
- Writing a simple source code scanner
- Running your scanner
- Tokens and lexical attributes
- Expanding our example to construct tokens
- Writing a scanner for Jzero
- The Jzero flex specification
- Unicon Jzero code
- Java Jzero code
- Running the Jzero scanner
- Regular expressions are not always enough
- Summary
- Questions
- Chapter 4: Parsing
- Technical requirements
- Analyzing syntax
- Understanding context-free grammars
- Writing context-free grammar rules
- Writing rules for programming constructs
- Using iyacc and BYACC/J
- Declaring symbols in the header section
- Putting together the yacc context-free grammar section
- Understanding yacc parsers
- Fixing conflicts in yacc parsers
- Syntax error recovery
- Putting together a toy example
- Writing a parser for Jzero
- The Jzero lex specification
- The Jzero yacc specification
- Unicon Jzero code
- Java Jzero parser code
- Running the Jzero parser
- Improving syntax error messages
- Adding detail to Unicon syntax error messages
- Adding detail to Java syntax error messages
- Using Merr to generate better syntax error messages
- Summary
- Questions
- Chapter 5: Syntax Trees
- Technical requirements
- Using GNU make
- Learning about trees
- Defining a syntax tree type
- Parse trees versus syntax trees
- Creating leaves from terminal symbols
- Wrapping tokens in leaves
- Working with YACC's value stack
- Wrapping leaves for the parser's value stack
- Determining which leaves you need
- Building internal nodes from production rules
- Accessing tree nodes on the value stack
- Using the tree node factory method
- Forming syntax trees for the Jzero language
- Debugging and testing your syntax tree
- Avoiding common syntax tree bugs
- Printing your tree in a text format
- Printing your tree using dot
- Summary
- Questions
- Section 2: Syntax Tree Traversals
- Chapter 6: Symbol Tables
- Technical requirements
- Establishing the groundwork for symbol tables
- Declarations and scopes
- Assigning and dereferencing variables
- Choosing the right tree traversal for the job
- Creating and populating symbol tables for each scope
- Adding semantic attributes to syntax trees
- Defining classes for symbol tables and symbol table entries
- Creating symbol tables
- Populating symbol tables
- Synthesizing the isConst attribute
- Checking for undeclared variables
- Identifying the bodies of methods
- Spotting uses of variables within method bodies
- Finding redeclared variables
- Inserting symbols into the symbol table
- Reporting semantic errors
- Handling package and class scopes in Unicon
- Mangling names
- Inserting self for member variable references
- Inserting self as the first parameter in method calls
- Testing and debugging symbol tables
- Summary
- Questions
- Chapter 7: Checking Base Types
- Technical requirements
- Type representation in the compiler
- Defining a base class for representing types
- Subclassing the base class for complex types
- Assigning type information to declared variables
- Synthesizing types from reserved words
- Inheriting types into a list of variables
- Determining the type at each syntax tree node
- Determining the type at the leaves
- Calculating and checking the types at internal nodes
- Runtime type checks and type inference in Unicon
- Summary
- Questions
- Chapter 8: Checking Types on Arrays, Method Calls, and Structure Accesses
- Technical requirements
- Checking operations on array types
- Handling array variable declarations
- Checking types during array creation
- Checking types during array accesses
- Checking method calls
- Calculating the parameters and return type information
- Checking the types at each method call site
- Checking the type at return statements
- Checking structured type accesses
- Handling instance variable declarations
- Checking types at instance creation
- Checking types at instance accesses
- Summary
- Questions
- Chapter 9: Intermediate Code Generation
- Technical requirements
- Preparing to generate code
- Why generate intermediate code?
- Learning about the memory regions in the generated program
- Introducing data types for intermediate code
- Adding the intermediate code attributes to the tree
- Generating labels and temporary variables
- An intermediate code instruction set
- Instructions
- Declarations
- Annotating syntax trees with labels for control flow
- Generating code for expressions
- Generating code for control flow
- Generating label targets for condition expressions
- Generating code for loops
- Generating intermediate code for method calls
- Reviewing the generated intermediate code
- Summary
- Chapter 10: Syntax Coloring in an IDE
- Downloading the example IDEs used in this chapter
- Integrating a compiler into a programmer's editor
- Analyzing source code from within the IDE
- Sending compiler output to the IDE
- Avoiding reparsing the entire file on every change
- Using lexical information to colorize tokens
- Extending the EditableTextList component to support color
- Coloring individual tokens as they are drawn
- Highlighting errors using parse results
- Adding Java support
- Summary
- Section 3: Code Generation and Runtime Systems
- Chapter 11: Bytecode Interpreters
- Technical requirements
- Understanding what bytecode is
- Comparing bytecode with intermediate code
- Building a bytecode instruction set for Jzero
- Defining the Jzero bytecode file format
- Understanding the basics of stack machine operation
- Implementing a bytecode interpreter
- Loading bytecode into memory
- Initializing the interpreter state
- Fetching instructions and advancing the instruction pointer
- Instruction decoding
- Executing instructions
- Starting up the Jzero interpreter
- Writing a runtime system for Jzero
- Running a Jzero program
- Examining iconx, the Unicon bytecode interpreter
- Understanding goal-directed bytecode
- Leaving type information in at runtime
- Fetching, decoding, and executing instructions
- Crafting the rest of the runtime system
- Summary
- Questions
- Chapter 12: Generating Bytecode
- Technical requirements
- Converting intermediate code to Jzero bytecode
- Adding a class for bytecode instructions
- Mapping intermediate code addresses to bytecode addresses
- Implementing the bytecode generator method
- Generating bytecode for simple expressions
- Generating code for pointer manipulation
- Generating bytecode for branches and conditional branches
- Generating code for method calls and returns
- Handling labels and other pseudo-instructions in intermediate code
- Comparing bytecode assembler with binary formats
- Printing bytecode in assembler format
- Printing bytecode in binary format
- Linking, loading, and including the runtime system
- Unicon example bytecode generation in icont
- Summary
- Questions
- Chapter 13: Native Code Generation
- Technical requirements
- Deciding whether to generate native code
- Introducing the x64 instruction set
- Adding a class for x64 instructions
- Mapping memory regions to x64 register-based address modes
- Using registers
- Starting from a null strategy
- Assigning registers to speed up the local region
- Converting intermediate code to x64 code
- Mapping intermediate code addresses to x64 locations
- Implementing the x64 code generator method
- Generating x64 code for simple expressions
- Generating code for pointer manipulation
- Generating native code for branches and conditional branches
- Generating code for method calls and returns
- Handling labels and pseudo-instructions
- Generating x64 output
- Writing the x64 code in assembly language format
- Going from native assembler to an object file
- Linking, loading, and including the runtime system
- Summary
- Questions
- Chapter 14: Implementing Operators and Built-In Functions
- Implementing operators
- Asking whether operators imply hardware support and vice versa
- Adding String concatenation to intermediate code generation
- Adding String concatenation to the bytecode interpreter
- Adding String concatenation to the native runtime system
- Writing built-in functions
- Adding built-in functions to the bytecode interpreter
- Writing built-in functions for use with the native code implementation
- Integrating built-ins with control structures
- Developing operators and functions for Unicon
- Writing operators in Unicon
- Developing Unicon's built-in functions
- Summary
- Questions
- Implementing operators
- Chapter 15: Domain Control Structures
- Knowing when you need a new control structure
- Defining what a control structure is
- Reducing excessive redundant parameters
- Scanning strings in Icon and Unicon
- Scanning environments and their primitive operations
- Eliminating excessive parameters via a control structure
- Rendering regions in Unicon
- Rendering 3D graphics from a display list
- Specifying rendering regions using built-in functions
- Varying graphical levels of detail using nested rendering regions
- Creating a rendering region control structure
- Summary
- Questions
- Knowing when you need a new control structure
- Chapter 16: Garbage Collection
- Appreciating the importance of garbage collection
- Counting references to objects
- Adding reference counting to Jzero
- Generating code for heap allocation
- Modifying the generated code for the assignment operator
- Considering the drawbacks and limitations of reference counting
- Marking live data and sweeping the rest
- Organizing heap memory regions
- Traversing the basis to mark live data
- Reclaiming live memory and placing it into contiguous chunks
- Summary
- Questions
- Chapter 17: Final Thoughts
- Reflecting on what was learned from writing this book
- Deciding where to go from here
- Studying programming language design
- Learning about implementing interpreters and bytecode machines
- Acquiring expertise in code optimization
- Monitoring and debugging program executions
- Designing and implementing IDEs and GUI builders
- Exploring references for further reading
- Studying programming language design
- Learning about implementing interpreters and bytecode machines
- Acquiring expertise in native code and code optimization
- Monitoring and debugging program executions
- Designing and implementing IDEs and GUI builders
- Summary
- Section 4: Appendix
- Appendix: Unicon Essentials
- Running Unicon
- Using Unicon's declarations and data types
- Declaring different kinds of program components
- Using atomic data types
- Organizing multiple values using structure types
- Evaluating expressions
- Forming basic expressions using operators
- Invoking procedures, functions, and methods
- Iterating and selecting what and how to execute
- Generators
- Debugging and environmental issues
- Learning the basics of the UDB debugger
- Environment variables
- Preprocessor
- Function mini-reference
- Selected keywords
- Assessments
- Chapter 1
- Chapter 2
- Chapter 3
- Chapter 4
- Chapter 5
- Chapter 6
- Chapter 7
- Chapter 8
- Chapter 11
- Chapter 12
- Chapter 13
- Chapter 14
- Chapter 15
- Chapter 16
- Why subscribe?
- Other Books You May Enjoy
- Packt is searching for authors like you
- Share Your Thoughts
- Tytuły: Build Your Own Programming Language
- Autor: Clinton L. Jeffery
- Tytuł oryginału: Build Your Own Programming Language
- ISBN Ebooka: 9781800200333, 9781800200333
- Data wydania: 2021-12-31
- Identyfikator pozycji: e_2t54
- Kategorie:
- Wydawca: Packt Publishing