IDA processor module

Writing a disassembler is a tedious task. You have to decode the opcode, interpret the meaning of the operands and, finally, print the instruction correctly. Fortunately, you can count on IDA to provide modules with mapping executable, a colorful GUI, control flow graphs and so on. In this article, I'll share my feedbacks on developing an IDP module for IDA.

Introduction

Even if IDA supports plenty of processors, sometimes you can stumble upon an unsupported architecture and you have to do the dirty job yourself. The aim of this article is to give an overview regarding the development of a processor module for IDA (IDP). However I'm not an expert and there's no documentation at all (only samples in the SDK). So, if you have specific questions, feel free to contact the IDA support. Igor and Ilfak answer really quickly (more quickly than my own mum), and if you find errata, feel free to leave a comment. :)

Disassembly process

IDA uses 3 steps to disassemble an instruction:

Analyze (ana)

In this step, the IDP module has to fill a global structure insn_t named cmd. Basically, this structure contains the instruction id, the size of the instruction, and operands information. Since IDA can only remember 2 operands type per instruction, if your targeted processor can handle multiple operand, you have to use them wisely.

Emulation (emu)

The term emulation doesn't mean you must define the whole semantic for all instructions. But you have to provide the instructions behavior expected by IDA:

  • ua_add_cref (Code cross-reference) fl_F (flow), fl_JN/fl_JF (jump near or far) or/and fl_CN/fl_CF (call near or far),
  • ua_add_dref (Data cross-reference) dr_O (offset), dr_R (read), dr_W (write), and so on.
  • segment register see [2].
  • stack analysis see [3].

These data are required in order to explore the executable as much as possible (the more the navigation bar is blue, the better).

Output (out)

Output process makes your IDP fill a string using tag color_t which enables IDA to display colorful disassembly. Usually, you start to reserve a local buffer and define it using the function init_output_buffer, then you can use out_tagon and out_tagoff to specify color or helper functions like out_keyword, out_symbol... Once the buffer is filled, you tell IDA it's done by calling term_output_buffer, setting gl_comm = 1 and calling MakeLine. To tell the truth, I had to check on sample for this one...

Notify

This step is not required at all, however it can improve the disassembly.

  • loader_elf_machine avoids the unfamous Undefined or unknown machine type warning message in ELF executable.
  • is_basic_block_end is useful to handle delay-slot or VLIW.
  • set_compiler defines compiler information and allows IDA to set a function prototype.

You can find all actions in idp.hpp, search for idp_notify.

Processor definition

The structure processor_t is very important for an IDP module, it exposes all implemented features so it must be filled carefully.

Architecture definitions

  • id is a unique identifier for your IDP, it MUST be >= 0x8000 and since this field is signed, don't put negative (like I did...), you'll get an obscure error message from IDA.
  • flag contains specific feature implemented in your IDP. For instance, if you plan to handle branch delay-slot, you must define PR_DELAYED, if you don't, IDA won't ask your IDP to handle them.

Callbacks

  • notify, u_ana, u_emu, u_out are callbacks described previously, they must be defined in this structure.
  • is_switch callback makes you fill a switch_info_ex_t structure. See [1].
    • start_ea the beginning of the switch (usually jump <register>)
    • ncases the number of cases
    • jumps the jump table, more precisely an array of label
    • defjump the default case
  • regsNum and regNames expose the number and names of registers. In some architectures, registers can be grouped (e.g. ax and ah, al). In this case, one is tempted to use bitfield and ignore the regNames (after all, you control the output process). This method prevents IDA from recognizing register operand, thus it'll be unable to show you the list of registers.
  • regFirstSreg, regLastSreg, segreg_size, regCodeSreg, regDataSreg belong to the category of fields used to describe both code segment and data segment. At first glance, it seems these data can be only used for ds and cs registers the x86 way. Not only, for instance the ARM IDP uses this feature to let the user defines the T bit (thumb mode). This feature is also very useful to implement gp register (Global Pointer) and I think it could be used to implement MBC (Memory Bank Control) present in most of old video game platforms. See global pointer implementation for further information.
  • is_sp_based and create_func_frame are useful if you plan to implement stack analysis. The first callback tells IDA if the stack offset if relative (sp) or absolute (fp). The second one let the IDP call add_frame to specify the size of stack frame. See stack analysis for further information.

Special features

[1]

Switch detection

If the compiler is able to generate optimized switch (jump table), you should definitely implement this feature. In order to do so, you need to recognize: the switch start (e.g. jump <reg0>), the jump table (e.g. lea <reg0>, [<jump_table>+<reg1>*4]), the number of cases (e.g. cmp <reg1>, <ncases>) and the default case (e.g. jae <default_case>).

ea_t StartEa = cmd.ea;
ea_t JmpTbl = ...;
ushort CasesNo = ...;
ea_t DefaultCase = ...;

// code your algorithm here

pSwitchInfoEx->startea = StartEa;
pSwitchInfoEx->ncases  = CasesNo;
pSwitchInfoEx->jumps   = JmpTbl;
pSwitchInfoEx->defjump = DefaultCase;
pSwitchInfoEx->set_jtable_element_size(4); // usually it's sizeof(void*) in the targeted processor

// this part is more or less ripped from arc IDP
setFlbits(cmd.ea, FF_JUMP);
set_switch_info_ex(StartEa,  pSwitchInfoEx);
create_switch_table(StartEa, pSwitchInfoEx);
create_switch_xrefs(StartEa, pSwitchInfoEx);
[2]

Global Pointer register

The following piece of code is mandatory to expose gp register to the user, thus he/she can use ALT+G to define a custom value for a specific instruction or SHIFT+F8 for a whole segment.

// register values
enum XXX_Registers
{
// ...
XXX_rVcs, XXX_GlobalPointer, XXX_RegisterNo,
XXX_Reg_Invalid
};

// register names
static char const* s_Registers[] =
{
// ...
"", "gp",
};

extern "C" EXPORT processor_t LPH = {
// ...
XXX_rVcs, XXX_GlobalPointer, 4, // regFirstSreg, regLastSreg, segreg_size
XXX_rVcs, XXX_GlobalPointer,   // regCodeSreg, regDataSreg
// ...
};

To retrieve the defined value, one can use the function getSR which returns sel_t (uint32).

uval_t Dst = rOprd.addr;
if (rOprd.specflag1 & XXX_OPRD_GP_REL)
{
  sel_t Sel = getSR(cmd.ea, XXX_GlobalPointer);
  if (Sel != BADADDR)
  Dst += Sel;
}
[3]

Stack analysis

The stack analysis can be really hard to implement since it requires to know the semantic of instructions (most of the time add/sub/lea/mov). The aim is to increment or decrement the sp register (Stack Pointer) using add_auto_stkpnt2 and flag the right immediate value as stack offset with ua_stkvar2 and op_stkvar. Don't forget to ask IDA nicely if you can manipulate these data by using may_trace_sp and may_create_stkvars.

Compilation

Here you can use the easy way (use Makefile in the sample folder) or the hard way. I picked the very hard way by trying to implement a cmake. This software is able to generate build-system for multiple environments. I find more convenient to write code using Visual Studio IDE and be able to debug the code by attaching the idaq.exe process. I also wanted to port my IDP on linux, but since I was not able to fully support this platform, I won't talk about it (maybe I'll update it later...).

# Only designed for Windows, Linux / 32-bit / processor

if (NOT DEFINED IDA_SDK_DIR)
  message(FATAL_ERROR "You must define IDA_SDK_DIR")
endif()

set(IDA_DEFINITIONS -D__NT__ -D__IDP__)
set(IDA_INCLUDE_DIRS ${IDA_SDK_DIR}/include)
set(IDA_LIB_TYPE x86_win_vc_32)
find_library(IDA_LIBRARY
  NAMES "ida.lib"
  HINTS "${IDA_SDK_DIR}/lib/${IDA_LIB_TYPE}"
  NO_DEFAULT_PATH
  )
set(IDA_LIBRARIES ${IDA_LIBRARY})

set(IDA_EXTENSION ".w32")

function(make_idp TARGET DESCRIPTION)
  add_custom_command(
    TARGET ${TARGET}
    POST_BUILD
    COMMAND ${IDA_SDK_DIR}/bin/mkidp.exe ARGS $<TARGET_FILE:${TARGET}> "${DESCRIPTION}"
    VERBATIM
    )
endfunction(make_idp)

To use it, you have to copy this script (FindIDP.cmake) in a folder <project_root>/cmake and write this code:

cmake_minimum_required(VERSION 2.8)
project(XXX)
set(CMAKE_MODULE_PATH ${CMAKE_MODULE_PATH} ${CMAKE_SOURCE_DIR}/cmake)
find_package(IDP)

# add your source code here

add_definitions(${IDA_DEFINITIONS})
include_directories(${IDA_INCLUDE_DIRS})

add_library(xxx_idp SHARED ...)
target_link_libraries(xxx_idp ${IDA_LIBRARIES})

set_target_properties(xxx_idp
  PROPERTIES
  PREFIX ""
  SUFFIX ${IDA_EXTENSION}
  OUTPUT_NAME xxx
)

make_idp(xxx_idp "XXX: xxx")

The last command make_idp is very important. If the compiled DLL doesn't embed IDA meta-information, it won't be loaded at all. And make sure to include ':' character.

Acknowledgements

  • Igor Skochinsky / Ilfak Guilfanov
  • ch0k0bn / kamino / noutoff

Comments