My Week in LLVM - Clone a Function
For my current research project, I’m using LLVM to harden C programs. One aspect of the project is to partition the program data into two disjoint sets. To enforce isolation between the disjoint sets, I add some instrumentation to every function. The instrumentation is different depending on which set the function uses, but there are some cases where I want two versions of a given function (one for each of the two sets). This leads to my problem of the week: how to clone a function in LLVM.
In this post, I’m not just going to jump to the answer. Instead I’ll give some of the steps followed to give an example the might help some people who are newer to LLVM.
#Step 1: Google It
Whenever I face a problem with LLVM, the first thing I do is Google it. Most of the time, there isn’t a StackOverflow question or a blog post that perfectly fits my problem, but sometimes there is. Always try Google. You might get lucky!
Here is what I get when I Google “llvm clone function”:
In the first result, I found a function called CloneFunction. It looks pretty promising:
There are a couple issues I see, though:
- the comment says “… without embeding the function into another module.” but i definitely want my function to be embedded in this module.
- What the heck is a ValueToValueMapTy?
- What the heck is a ClonedCodeInfo?
Let’s put 2 and 3 aside for now. How do I get my function to be cloned into this module? I Googled and grepped the LLVM source, but couldn’t find any function that cloned a function and took a Module reference as a parameter. Then I found CloneFunctionInto which takes a new function to copy-paste the original function’s code into.
Using this, I can clone one function into another, so I just need to make sure NewFunc is a function in the current module.
#Step 2: Grep for It
At this point I’m pretty sure I want to use CloneFunctionInto, but I don’t know what a ValueToValueMapTy is. For CloneCodeInfo it takes a default value of nullptr, so I’m just going with that. To figure out how to use any LLVM type, my go to tactic is to find somewhere in the LLVM source code that uses the type and adapt it for my purposes.
Unfortunately for me, ValueToValueMapTy is used in a lot of places, so I grepped around for a long time. Eventually, I had the bright idea that cloning a function is kind of like inlining a function into an empty function, so I looked in InlineFunction.cpp and I found this snippet:
This is what I’m looking for, but instead of mapping the arguments passed to the function to the function’s formal parameters, I want to map my new function’s formal parameters to my original function’s formal parameters.
#Step 3: Write My Solution
Now I’m ready to write my solution.
I ensured that my new function was in this module by creating my a new function using Module::getOrInsertionFunction. I gave the new function the same type as the original, and the same name with “_cloned” added to the end.
Then I have a for-loop other both my new function’s arguments and the old function’s arguments and I map them to each other with a ValueToValueMapTy instance.
I set ModuleLevelChanges to true, even though I’m not 100% sure what that does.
For the Returns parameter, I made a new vector of return instructions using:
Currently, I do not store this vector anywhere, but I should. I’ll need to find all the return instructions later in my pass, but that’s a job for another day.
For the NameSuffix parameter, I passed “_cloned” just so I can always tell if the values are from a cloned function or not. In the future I should use a smarter suffix.
For the other parameters, I just left the default value.
#Conclusion
That’s how I clone functions in LLVM. The full code will be open sourced as soon as my paper is accepted :).
As always, tips and corrections are welcome. You can contact me on Twitter.
My new blog post on how I solved a problem in LLVM this week: https://t.co/nt1a3T4DYQ
— Scott Carr (@ScottCarr) January 24, 2016