Tuesday, November 19, 2019

[Linker Script Tutorial] $3 - Simple Linker Script Commands

In this section we describe the simple linker script commands.



1 Setting the Entry Point

The first instruction to execute in a program is called the entry point. You can use the ENTRY linker script command to set the entry point. The argument is a symbol name:
ENTRY(symbol)
There are several ways to set the entry point. The linker will set the entry point by trying each of the following methods in order, and stopping when one of them succeeds:

  • the ‘-e’ entry command-line option;
  • the ENTRY(symbol) command in a linker script;
  • the value of a target-specific symbol, if it is defined; For many targets this is start, but PE- and BeOS-based systems for example check a list of possible entry symbols, matching the first one found.
  • the address of the first byte of the ‘.text’ section, if present;
  • The address 0.

2 Commands Dealing with Files

Several linker script commands deal with files.
INCLUDE filename
Include the linker script filename at this point. The file will be searched for in the current directory, and in any directory specified with the -L option. You can nest calls to INCLUDE up to 10 levels deep.
You can place INCLUDE directives at the top level, in MEMORY or SECTIONS commands, or in output section descriptions.
INPUT(filefile, …)
INPUT(file file …)
The INPUT command directs the linker to include the named files in the link, as though they were named on the command line.
For example, if you always want to include subr.o any time you do a link, but you can’t be bothered to put it on every link command line, then you can put ‘INPUT (subr.o)’ in your linker script.
In fact, if you like, you can list all of your input files in the linker script, and then invoke the linker with nothing but a ‘-T’ option.
In case a sysroot prefix is configured, and the filename starts with the ‘/’ character, and the script being processed was located inside the sysroot prefix, the filename will be looked for in the sysroot prefix. Otherwise, the linker will try to open the file in the current directory. If it is not found, the linker will search through the archive library search path. The sysroot prefix can also be forced by specifying = as the first character in the filename path, or prefixing the filename path with $SYSROOT. See also the description of ‘-L’ in Command-line Options.
If you use ‘INPUT (-lfile)’, ld will transform the name to libfile.a, as with the command-line argument ‘-l’.
When you use the INPUT command in an implicit linker script, the files will be included in the link at the point at which the linker script file is included. This can affect archive searching.
GROUP(filefile, …)
GROUP(file file …)
The GROUP command is like INPUT, except that the named files should all be archives, and they are searched repeatedly until no new undefined references are created. See the description of ‘-(’ in Command-line Options.
AS_NEEDED(filefile, …)
AS_NEEDED(file file …)
This construct can appear only inside of the INPUT or GROUP commands, among other filenames. The files listed will be handled as if they appear directly in the INPUT or GROUP commands, with the exception of ELF shared libraries, that will be added only when they are actually needed. This construct essentially enables --as-needed option for all the files listed inside of it and restores previous --as-needed resp. --no-as-needed setting afterwards.
OUTPUT(filename)
The OUTPUT command names the output file. Using OUTPUT(filename) in the linker script is exactly like using ‘-o filename’ on the command line (see Command Line Options). If both are used, the command-line option takes precedence.
You can use the OUTPUT command to define a default name for the output file other than the usual default of a.out.
SEARCH_DIR(path)
The SEARCH_DIR command adds path to the list of paths where ld looks for archive libraries. Using SEARCH_DIR(path) is exactly like using ‘-L path’ on the command line (see Command-line Options). If both are used, then the linker will search both paths. Paths specified using the command-line option are searched first.
STARTUP(filename)
The STARTUP command is just like the INPUT command, except that filename will become the first input file to be linked, as though it were specified first on the command line. This may be useful when using a system in which the entry point is always the start of the first file.


3 Commands Dealing with Object File Formats

A couple of linker script commands deal with object file formats.
OUTPUT_FORMAT(bfdname)
OUTPUT_FORMAT(defaultbiglittle)
The OUTPUT_FORMAT command names the BFD format to use for the output file (see BFD). Using OUTPUT_FORMAT(bfdname) is exactly like using ‘--oformat bfdname’ on the command line (see Command-line Options). If both are used, the command line option takes precedence.
You can use OUTPUT_FORMAT with three arguments to use different formats based on the ‘-EB’ and ‘-EL’ command-line options. This permits the linker script to set the output format based on the desired endianness.
If neither ‘-EB’ nor ‘-EL’ are used, then the output format will be the first argument, default. If ‘-EB’ is used, the output format will be the second argument, big. If ‘-EL’ is used, the output format will be the third argument, little.
For example, the default linker script for the MIPS ELF target uses this command:
OUTPUT_FORMAT(elf32-bigmips, elf32-bigmips, elf32-littlemips)
This says that the default format for the output file is ‘elf32-bigmips’, but if the user uses the ‘-EL’ command-line option, the output file will be created in the ‘elf32-littlemips’ format.
TARGET(bfdname)
The TARGET command names the BFD format to use when reading input files. It affects subsequent INPUT and GROUP commands. This command is like using ‘-b bfdname’ on the command line (see Command-line Options). If the TARGET command is used but OUTPUT_FORMAT is not, then the last TARGET command is also used to set the format for the output file. See BFD.


4 Assign alias names to memory regions

Alias names can be added to existing memory regions created with the MEMORY command. Each name corresponds to at most one memory region.
REGION_ALIAS(alias, region)
The REGION_ALIAS function creates an alias name alias for the memory region region. This allows a flexible mapping of output sections to memory regions. An example follows.
Suppose we have an application for embedded systems which come with various memory storage devices. All have a general purpose, volatile memory RAM that allows code execution or data storage. Some may have a read-only, non-volatile memory ROM that allows code execution and read-only data access. The last variant is a read-only, non-volatile memory ROM2 with read-only data access and no code execution capability. We have four output sections:
  • .text program code;
  • .rodata read-only data;
  • .data read-write initialized data;
  • .bss read-write zero initialized data.
The goal is to provide a linker command file that contains a system independent part defining the output sections and a system dependent part mapping the output sections to the memory regions available on the system. Our embedded systems come with three different memory setups AB and C:
SectionVariant AVariant BVariant C
.textRAMROMROM
.rodataRAMROMROM2
.dataRAMRAM/ROMRAM/ROM2
.bssRAMRAMRAM
The notation RAM/ROM or RAM/ROM2 means that this section is loaded into region ROM or ROM2 respectively. Please note that the load address of the .data section starts in all three variants at the end of the .rodata section.
The base linker script that deals with the output sections follows. It includes the system dependent linkcmds.memory file that describes the memory layout:
INCLUDE linkcmds.memory

SECTIONS
  {
    .text :
      {
        *(.text)
      } > REGION_TEXT
    .rodata :
      {
        *(.rodata)
        rodata_end = .;
      } > REGION_RODATA
    .data : AT (rodata_end)
      {
        data_start = .;
        *(.data)
      } > REGION_DATA
    data_size = SIZEOF(.data);
    data_load_start = LOADADDR(.data);
    .bss :
      {
        *(.bss)
      } > REGION_BSS
  }
Now we need three different linkcmds.memory files to define memory regions and alias names. The content of linkcmds.memory for the three variants AB and C:
A
Here everything goes into the RAM.
MEMORY
  {
    RAM : ORIGIN = 0, LENGTH = 4M
  }

REGION_ALIAS("REGION_TEXT", RAM);
REGION_ALIAS("REGION_RODATA", RAM);
REGION_ALIAS("REGION_DATA", RAM);
REGION_ALIAS("REGION_BSS", RAM);
B
Program code and read-only data go into the ROM. Read-write data goes into the RAM. An image of the initialized data is loaded into the ROM and will be copied during system start into the RAM.
MEMORY
  {
    ROM : ORIGIN = 0, LENGTH = 3M
    RAM : ORIGIN = 0x10000000, LENGTH = 1M
  }

REGION_ALIAS("REGION_TEXT", ROM);
REGION_ALIAS("REGION_RODATA", ROM);
REGION_ALIAS("REGION_DATA", RAM);
REGION_ALIAS("REGION_BSS", RAM);
C
Program code goes into the ROM. Read-only data goes into the ROM2. Read-write data goes into the RAM. An image of the initialized data is loaded into the ROM2 and will be copied during system start into the RAM.
MEMORY
  {
    ROM : ORIGIN = 0, LENGTH = 2M
    ROM2 : ORIGIN = 0x10000000, LENGTH = 1M
    RAM : ORIGIN = 0x20000000, LENGTH = 1M
  }

REGION_ALIAS("REGION_TEXT", ROM);
REGION_ALIAS("REGION_RODATA", ROM2);
REGION_ALIAS("REGION_DATA", RAM);
REGION_ALIAS("REGION_BSS", RAM);
It is possible to write a common system initialization routine to copy the .data section from ROM or ROM2 into the RAM if necessary:
#include <string.h>

extern char data_start [];
extern char data_size [];
extern char data_load_start [];

void copy_data(void)
{
  if (data_start != data_load_start)
    {
      memcpy(data_start, data_load_start, (size_t) data_size);
    }
}


5 Other Linker Script Commands

There are a few other linker scripts commands.
ASSERT(expmessage)
Ensure that exp is non-zero. If it is zero, then exit the linker with an error code, and print message.
Note that assertions are checked before the final stages of linking take place. This means that expressions involving symbols PROVIDEd inside section definitions will fail if the user has not set values for those symbols. The only exception to this rule is PROVIDEd symbols that just reference dot. Thus an assertion like this:
  .stack :
  {
    PROVIDE (__stack = .);
    PROVIDE (__stack_size = 0x100);
    ASSERT ((__stack > (_end + __stack_size)), "Error: No room left for the stack");
  }
will fail if __stack_size is not defined elsewhere. Symbols PROVIDEd outside of section definitions are evaluated earlier, so they can be used inside ASSERTions. Thus:
  PROVIDE (__stack_size = 0x100);
  .stack :
  {
    PROVIDE (__stack = .);
    ASSERT ((__stack > (_end + __stack_size)), "Error: No room left for the stack");
  }
will work.
EXTERN(symbol symbol …)
Force symbol to be entered in the output file as an undefined symbol. Doing this may, for example, trigger linking of additional modules from standard libraries. You may list several symbols for each EXTERN, and you may use EXTERN multiple times. This command has the same effect as the ‘-u’ command-line option.
FORCE_COMMON_ALLOCATION
This command has the same effect as the ‘-d’ command-line option: to make ld assign space to common symbols even if a relocatable output file is specified (‘-r’).
INHIBIT_COMMON_ALLOCATION
This command has the same effect as the ‘--no-define-common’ command-line option: to make ld omit the assignment of addresses to common symbols even for a non-relocatable output file.
FORCE_GROUP_ALLOCATION
This command has the same effect as the ‘--force-group-allocation’ command-line option: to make ld place section group members like normal input sections, and to delete the section groups even if a relocatable output file is specified (‘-r’).
INSERT [ AFTER | BEFORE ] output_section
This command is typically used in a script specified by ‘-T’ to augment the default SECTIONS with, for example, overlays. It inserts all prior linker script statements after (or before) output_section, and also causes ‘-T’ to not override the default linker script. The exact insertion point is as for orphan sections. See Location Counter. The insertion happens after the linker has mapped input sections to output sections. Prior to the insertion, since ‘-T’ scripts are parsed before the default linker script, statements in the ‘-T’ script occur before the default linker script statements in the internal linker representation of the script. In particular, input section assignments will be made to ‘-T’ output sections before those in the default script. Here is an example of how a ‘-T’ script using INSERT might look:
SECTIONS
{
  OVERLAY :
  {
    .ov1 { ov1*(.text) }
    .ov2 { ov2*(.text) }
  }
}
INSERT AFTER .text;
NOCROSSREFS(section section …)
This command may be used to tell ld to issue an error about any references among certain output sections.
In certain types of programs, particularly on embedded systems when using overlays, when one section is loaded into memory, another section will not be. Any direct references between the two sections would be errors. For example, it would be an error if code in one section called a function defined in the other section.
The NOCROSSREFS command takes a list of output section names. If ld detects any cross references between the sections, it reports an error and returns a non-zero exit status. Note that the NOCROSSREFS command uses output section names, not input section names.
NOCROSSREFS_TO(tosection fromsection …)
This command may be used to tell ld to issue an error about any references to one section from a list of other sections.
The NOCROSSREFS command is useful when ensuring that two or more output sections are entirely independent but there are situations where a one-way dependency is needed. For example, in a multi-core application there may be shared code that can be called from each core but for safety must never call back.
The NOCROSSREFS_TO command takes a list of output section names. The first section can not be referenced from any of the other sections. If ld detects any references to the first section from any of the other sections, it reports an error and returns a non-zero exit status. Note that the NOCROSSREFS_TO command uses output section names, not input section names.
OUTPUT_ARCH(bfdarch)
Specify a particular output machine architecture. The argument is one of the names used by the BFD library (see BFD). You can see the architecture of an object file by using the objdump program with the ‘-f’ option.
LD_FEATURE(string)
This command may be used to modify ld behavior. If string is "SANE_EXPR" then absolute symbols and numbers in a script are simply treated as numbers everywhere. See Expression Section.



No comments:

Post a Comment

Back to Top