unModified()

Break stuff. Now.

Finally Used Generics

And why it took a while for me to know how to use them

July 18, 2020

I write code in JavaScript, PHP, and several other languages. But most of them are "scripting languages" which are usually dynamic languages. This means no types and definitely no generics. So when someone says generics are "for code reuse and reduction", it's hard to visualize how exactly generics achieve that.

Until yesterday.

I was working on some text parsing code. It works by progressively passing the string through a set of parsers, recording structures it finds along the way. In my current implementation, I have two overarching functions: parseBlocks which parses for "block" structures, and parseInline which parses for "inline" structures.

The following Typescript-like pseudocode illustrates how both functions look like:

function parseInlines(text: string, parsers: Array<(d: string) => Inline>): Array<Inline> {
  let components: Array<Inline> = []
  let index: number = 0

  while (index < text.length) {
    const current: string = text.substr(index, 80)
    const chunk: Inline = getBlockParsers()
      .reduce((v, parse) => v || parse(current), null)

    if (chunk) {
      components.push(chunk)
      index += chunk.length
    } else {
      throw new Error(`Unknown chunk found at ${current}`)
    }
  }

  return components
}

function parseBlocks(text: string, parsers: Array<(d: string) => Block>): Array<Block> {
  let components: Array<Block> = []
  let index: number = 0

  while (index < text.length) {
    const current: string = text.substr(index, 80)
    const chunk: Block = getBlockParsers()
      .reduce((v, parse) => v || parse(current), null)

    if (chunk) {
      components.push(chunk)
      index += chunk.length
    } else {
      throw new Error(`Unknown chunk found at ${current}`)
    }
  }

  return components
}

Now you might think it's redundant code. Yes, you are right, it is very redundant code. But like I said, coming from a dynamic language, it's hard to imagine how one would refactor them when they deal with two different things. If only this were JavaScript, I could easily drop the types and condense them into one function:

function parse(text, parsers) {
  let components = []
  let index = 0

  while (index < text.length) {
    const current = text.substr(index, 80)
    const chunk = parsers.reduce((v, parse) => v || parse(current), null)

    if (chunk) {
      components.push(chunk)
      index += chunk.length
    } else {
      throw new Error(`Unknown chunk found at ${currentText}`)
    }
  }

  return components
}

This was when I got my aha moment.

The first code example cared too much about whether it was parsing a Block or Inline. In the code above, dropping the explicit types made both functions identical, making it able to parse both scenarios. How can we care about types but at the same time not care about types?

That's exactly what Generics do. Generics are placeholder types that lets us avoid hardcoding concrete types but at the same time provide the same guarantees of strict typing. That is, a value of type generic type T will always be a T, whatever concrete type T ends up being.

So refactoring the JavaScript version back into strict form, but this time using Generics, we get the following function (or something like it):

function parse<T>(text: string, parsers: Array<(d: string) => T>): Array<T> {
  let components: Array<T> = []
  let index: number = 0

  while (index < text.length) {
    const current: string = text.substr(i, 80)
    const chunk: T = parsers.reduce((v, parse) => v || parse(current), null)

    if (chunk) {
      components.push(chunk)
      index += chunk.length
    } else {
      throw new Error(`Unknown chunk found at ${current}`)
    }
  }

  return components
}

In the example above, T takes the place of both Block and Inline. The code no longer cares about both concrete types. But the compiler guarantees us that a T will always be a T, that each of the parsers provided will return a T, and that the items in the array returned by parse are also T, regardless of what concrete type T ends up being in actual use.

Conclusion

In summary:

  • Extracting logic to common functions is logic abstraction.
  • Using generics in place of concrete types is type abstraction.

So there's that, my first adventure into generics. It's kind of cool, but very, very situational. It requires that you know that a function is operating identically regardless of concrete type to be able to conclude that it can be genericized. Getting this wrong or forcing generics onto everything, code can get unreadable and unmaintainable fast.


PS: IDEs provide tooltip assist often by just printing what's written on doc blocks. This is fine when you're dealing with things that have concrete types defined, but is very annoying when you're dealing with genericized functions, especially from interfaces. Not only do you not see actual implementation, you're also just thrown an alphabet soup of generic types that almost always makes no sense.

Hopefullly IDEs would be smart enough to understand, based on statically analyzing the code, to replace generic types in tooltips with concrete types on the fly. That would be a very big help, especially to those who rely on those tooltips and autocompletion a lot.