A technical tale of NodeSecure - Chapter 2

Thomas.G - Jun 6 '22 - - Dev Community

Hello ๐Ÿ‘‹,

I'm back at writing for a new technical article on NodeSecure. This time I want to focus on the SAST JS-X-Ray ๐Ÿ”ฌ.

I realized very recently that the project on Github was already more than two years old. It's amazing how time flies ๐Ÿ˜ต.

It's been a long time since I wanted to share my experience and feelings about AST analysis. So let's jump in ๐Ÿ˜‰

๐Ÿ’ƒ How it started

When I started the NodeSecure project I had almost no experience ๐Ÿค with AST (Abstract Syntax Tree). My first time was on the SlimIO project to generate codes dynamically with the astring package (and I had also looked at the ESTree specification).

One of my first goals for my tool was to be able to retrieve the dependencies in each JavaScript file contained within an NPM tarball (By this I mean able to retrieve any dependencies imported in CJS or ESM).

I started the subject a bit naively ๐Ÿ˜ and very quickly I set myself a challenge to achieve with my AST analyser:

function unhex(r) {
   return Buffer.from(r, "hex").toString();
}

const g = Function("return this")();
const p = g["pro" + "cess"];

const evil = p["mainMod" + "ule"][unhex("72657175697265")];
evil(unhex("68747470")).request
Enter fullscreen mode Exit fullscreen mode

The goal is to be able to output accurate information for the above code. At the time I didn't really know what I was getting into ๐Ÿ˜‚ (But I was passionate about it and I remain excited about it today).

I thank Targos who at the time submitted a lot of code and ideas.

To date the SAST is able to follow this kind of code without any difficulties ๐Ÿ˜Ž... But it wasn't always that simple.

๐Ÿค Baby steps

One of the first things I learned was to browse the tree. Even for me today this seems rather obvious, but it wasn't necessarily so at the time ๐Ÿ˜….

I discovered the package estree-walker from Rich Harris which was compatible with the EStree spec. Combined with the meriyah package this allows me to convert a JavaScript source into an ESTree compliant AST.

import { readFile } from "node:fs/promises";

import { walk } from "estree-walker";
import * as meriyah from "meriyah";

export async function scanFile(location: string) {
  const strToAnalyze = await readFile(location, "utf-8");

  const { body } = meriyah.parseScript(strToAnalyze, {
    next: true, loc: true, raw: true, module: true
  });

  walk(body, {
    enter(node) {
      // Skip the root of the AST.
      if (Array.isArray(node)) {
        return;
      }

      // DO THE WORK HERE
    }
  });
}
Enter fullscreen mode Exit fullscreen mode

I also quickly became familiar with the tool ASTExplorer which allows you to analyze the tree and properties for a specific code.

nodesecure

As a beginner, you can be quickly scared by the size and complexity of an AST. This tool is super important to better cut out and focus on what is important.

I also had fun re-implementing the ESTree Specification in TypeScript. It helped me a lot to be more confident and comfortable with different concepts that were unknown to me until then.

At the beginning of 2021 I also had the opportunity to do a talk for the French JS community (it's one more opportunity to study).

๐Ÿ˜ซ MemberExpression

JavaScript member expression can be quite complicated to deal with at first. You must be comfortable with recursion and be ready to face a lot of possibilities.

Here is an example of possible code:

const myVar = "test";
foo.bar["hel" + "lo"].test[myVar]();
Enter fullscreen mode Exit fullscreen mode

nodesecure

Computed property, Binary expression, Call expression etc. The order in which the tree is built seemed unintuitive to me at first (and I had a hard time figuring out how to use the object and property properties).

Since i created my own set of AST utilities including getMemberExpressionIdentifier.

๐Ÿš€ A new package (with its own API)

When NodeSecure was a single project the AST analysis was at most a few hundred lines in two or three JavaScript files. All the logic was coded with if and else conditions directly in the walker ๐Ÿ™ˆ.

nodesecure

To evolve and maintain the project, it became necessary to separate the code and make it a standalone package with its own API ๐Ÿ‘€.

I wrote an article at the time that I invite you to read. It contains some nice little explanations:

The thing to remember here is that you probably shouldn't be afraid to start small and grow into something bigger later. Stay pragmatic.

Easy to write, hard to scale ๐Ÿ˜ญ

It's easy to write a little prototype, but it's really hard to make it scale when you have to handle dozens or hundreds of possibilities. It requires a mastery and understanding of the language that is just crazy ๐Ÿ˜ต. This is really what makes creating a SAST a complicated task.

For example, do you know how many possibilities there are to require on Node.js? In CJS alone:

  • require
  • process.mainModule.require
  • require.main.require

I probably forget some ๐Ÿ˜ˆ (as a precaution I also trace methods like require.resolve).

But as far as I'm concerned, it's really what I find exciting ๐Ÿ˜. I've learned so much in three years. All this also allowed me to approach the language from an angle that I had never experienced or seen ๐Ÿ‘€.

Probes

On JS-X-Ray I brought the notion of "probe" into the code which will collect information on one or more specific node. The goal is to separate the AST analysis into lots of smaller pieces that are easier to understand, document and test.

Very far from perfection ๐Ÿ˜ž. However, it is much better than before and the team is now helping me to improve all this (by adding documentation and tests).

It was for JS-X-Ray 3.0.0 and at the time i have written the following article (which includes many more details if you are interested).

VariableTracer

This is one of the new killer feature coming to JS-X-Ray soon. A code able to follow the declarations, assignment, destructuration, importating of any identifiers or member expression.

In my experience being able to keep track of assignments has been one of the most complex tasks (and I've struggled with it).

This new implementation/API will offer a new spectrum of tools to develop really cool new features.

const tracer = new VariableTracer().trace("crypto.createHash", {
  followConsecutiveAssignment: true
});

// Use this in the tree walker
tracer.walk(node);
Enter fullscreen mode Exit fullscreen mode

This simple code will allow us, for example, to know each time the method createHash is used. We can use this for information purposes, for example to warn on the usage of a deprecated hash algorithm like md5.

Here an example:

const myModule = require("crypto");

const myMethodName = "createHash";
const callMe = myModule[myMethodName];
callMe("md5");
Enter fullscreen mode Exit fullscreen mode

The goal is not necessarily to track or read malicious code. The idea is to handle enough cases because developers use JavaScript in many ways.

We can imagine and implement a lot of new scenarios without worries ๐Ÿ˜.

By default we are tracing:

  • eval and Function
  • require, require.resolve, require.main, require.mainModule.require
  • Global variables (global, globalThis, root, GLOBAL, window).

โœจ Conclusion

Unfortunately, I could not cover everything as the subject is so vast. One piece of advice I would give to anyone starting out on a similar topic would be to be much more rigorous about documentation and testing. It can be very easy to get lost and not know why we made a choice X or Y.

Thanks for reading this new technical article. See you soon for a new article (something tells me that it will arrive soon ๐Ÿ˜).

. . . . . . . . . . . . . . . . . . . . .
Terabox Video Player