How I made my package 67x faster (and more functional at the same time)

Author: Mattia Micheletta Merlin; Date: 25/12/2025

This summer I created a Julia package that solves indefinite integrals (more on that in my other blog post) that was great but really slow. This blog post is a (quite technical but understandable) description of how I improved the "time to the first integral" from ~14 minutes to ~12 seconds, while making it capable of solving more integrals.

First, you must know that this system works using a rule-based approach. This means checking the input expression to integrate against a long list of integration rules (>3400). It was all sunshine and roses except that the rule mechanism for matching the input expression against the rules, that was implemented in SymbolicUtils.jl, was never designed for handling so many rules, and so the performance was not great. First of all, loading the package took ages (~8 min on a laptop) because this rule mechanism created many functions for each rule (more on that later). Secondly, after loading the package the time to execute the first integral was again ages (~6 min) because the Julia compiler needed to do stuff (more on that later). After that first integral then, everything was fast, but the initial wait made the package basically unusable. I thought about improving it during the summer, but I was busy actually building the package and didn't have time to worry about the infrastructure it was based on. But then in October I had an idea...

How the previous rule matching system worked (in a nutshell)

You can imagine a symbolic expression like a tree. For example the expression \(y*exp(y^2-1)+1\) can be represented as:

julia> tree_print(y*exp(y^2-1)+1)
└─+ (2 arguments)
  ├─1
  └─* (2 arguments)
    ├─exp (1 arguments)
    │ └─+ (2 arguments)
    │   ├─-1
    │   └─^ (2 arguments)
    │     ├─y
    │     └─2
    └─y

The previous system to recognize if an input expression matched a rule, worked by creating a function for the top node of the tree of the symbolic expression of the rule, that called other functions created for the children nodes, let them be both end leaves (symbols, numbers) or other operations. Those functions then called other functions created for their children nodes and so on. So just creating a rule was a computationally expensive task involving many nested functions.

On top of this, Julia being a JIT (just-in-time) compiled language, when you call a function for the first time, the compiler needs to check what type of data it returns, so it can optimize future calls to that function. This type inference process can be really long if the function is complex, as was in my case because of all the nested calls to other functions created for the children nodes of the tree. This needed to be done for each rule.

The improvements

My idea was instead to represent the rules just as Julia Exprs, which is a Julia data type used to represent Julia code itself, basically a parsed string of code, that is also represented as a tree.

julia> ans = Meta.parse("1+(1+y)^2")
:(1 + (1 + y) ^ 2)

julia> dump(ans)
Expr
  head: Symbol call
  args: Array{Any}((3,))
    1: Symbol +
    2: Int64 1
    3: Expr
      head: Symbol call
      args: Array{Any}((3,))
        1: Symbol ^
        2: Expr
          head: Symbol call
          args: Array{Any}((3,))
            1: Symbol +
            2: Int64 1
            3: Symbol y
        3: Int64 2

So at initialization no function needs to be created. Then I use a recursive function (only one for each rule) that traverses both trees at the same time (rule and input expression), until a difference is found or the whole tree is traversed. This approach is much simpler and surprisingly much faster! This effectively solved the type inference time as well, because there is now only one function to be compiled. Here you can see all the details of the implementation.

On top of that, I improved the recursive traversal of the tree to be able to match expressions more thoroughly. I won't go into too much detail here, but for example if you have a rule like \((a+bx)^n*(c+dx)^m\), you want it to match also expressions like \((1+2x)^2/(2+3x)^4\) with \(m=-4\), but as you can imagine their trees are completely different! The simple fact that equivalent mathematical expressions have completely different tree representations caused me so many hassles...

Results

I timed a simple test script:

using Symbolics, SymbolicIntegration
@variables x
result = integrate(asinh(x), RuleBasedMethod(verbose=true))
println(result)

With the old system I had:

 julia --project=. prova.jl  850.31s user 72.69s system 111% cpu 13:50.71 total

With the new system I have:

 julia --project=. prova.jl  11.95s user 0.49s system 101% cpu 12.244 total

That is a 67x speedup.