<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:base="https://solson.me/">
  <id>https://solson.me/</id>
  <title>Scott Olson's Blog</title>
  <updated>2021-06-02T00:00:00Z</updated>
  <link rel="alternate" href="https://solson.me/" type="text/html"/>
  <link rel="self" href="https://solson.me/feed.xml" type="application/atom+xml"/>
  <author>
    <name>Scott Olson</name>
    <uri>https://solson.me</uri>
  </author>
  <icon>https://www.gravatar.com/avatar/ce3e1d408e61ccb7c977364bd15092aa?size=32</icon>
  <entry>
    <id>tag:solson.me,2021-06-02:/2021/06/02/c-arrays-are-not-pointers.html</id>
    <title type="html">C Arrays Are Not Pointers: An ELF's Perspective</title>
    <published>2021-06-02T00:00:00Z</published>
    <updated>2021-06-02T00:00:00Z</updated>
    <link rel="alternate" href="https://solson.me/2021/06/02/c-arrays-are-not-pointers.html" type="text/html"/>
    <content type="html">&lt;p&gt;This is a short response to &lt;a href="https://lovesegfault.com/posts/c-arrays-are-not-pointers/"&gt;lovesegfault’s post of the same
name&lt;/a&gt;, which I
recommend as the clearest explainer I’ve seen for this subtle detail of C.&lt;/p&gt;

&lt;p&gt;I wanted to explain my own favourite distinction between pointers and arrays.
For expediency, I will assume general knowledge of C and
linkers/loaders/&lt;a href="https://en.wikipedia.org/wiki/Executable_and_Linkable_Format"&gt;ELF&lt;/a&gt;.
Let’s start with a naive approach to defining a global string constant:&lt;/p&gt;

&lt;pre&gt;&lt;code class="language-c"&gt;&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="kt"&gt;char&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;kMyStringMut&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"hello world"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;If we run &lt;code&gt;clang -c foo.c -o foo.o &amp;amp;&amp;amp; objdump -x foo.o&lt;/code&gt; on this file, we can
observe two problems:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;SYMBOL TABLE:
0000000000000000 l    d  .rodata.str1.1 0000000000000000 .rodata.str1.1
0000000000000000 g     O .data  0000000000000008 kMyStringMut

RELOCATION RECORDS FOR [.data]:
OFFSET           TYPE              VALUE
0000000000000000 R_X86_64_64       .rodata.str1.1
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;A quick explanation:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;
&lt;code&gt;.rodata.str1.1&lt;/code&gt; contains the string’s actual bytes.&lt;/li&gt;
  &lt;li&gt;
&lt;code&gt;kMyStringMut&lt;/code&gt; is an 8-byte slot for a 64-bit address.&lt;/li&gt;
  &lt;li&gt;The relocation records ensure &lt;code&gt;kMyStringMut&lt;/code&gt; will contain the address of
&lt;code&gt;.rodata.str1.1&lt;/code&gt; at runtime.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;But what is wrong with this picture? First of all, &lt;code&gt;kMyStringMut&lt;/code&gt; has been
placed in &lt;code&gt;.data&lt;/code&gt;. That isn’t read-only! We forgot to include the top-level
&lt;code&gt;const&lt;/code&gt; for the global variable itself. It’s not a constant at all, and any code
in the program could reassign it with &lt;code&gt;kMyStringMut = myNewString&lt;/code&gt;. Let’s fix
that:&lt;/p&gt;

&lt;pre&gt;&lt;code class="language-c"&gt;&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="kt"&gt;char&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="n"&gt;kMyStringIndirect&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"hello world"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;pre&gt;&lt;code&gt;SYMBOL TABLE:
0000000000000000 l    d  .rodata.str1.1 0000000000000000 .rodata.str1.1
0000000000000000 g     O .data.rel.ro   0000000000000008 kMyStringIndirect

RELOCATION RECORDS FOR [.data.rel.ro]:
OFFSET           TYPE              VALUE
0000000000000000 R_X86_64_64       .rodata.str1.1
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;There we go! It’s in &lt;code&gt;.data.rel.ro&lt;/code&gt; which &lt;em&gt;is&lt;/em&gt; read-only as intended.&lt;/p&gt;

&lt;p&gt;But, wait a minute, why does the ELF object file even &lt;em&gt;store&lt;/em&gt; a pointer to the
string’s bytes, and not just the bytes themselves? Because &lt;em&gt;C arrays are not
pointers&lt;/em&gt;. Let’s switch to an array:&lt;/p&gt;

&lt;pre&gt;&lt;code class="language-c"&gt;&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="n"&gt;kMyString&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"hello world"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;pre&gt;&lt;code&gt;SYMBOL TABLE:
0000000000000000 g     O .rodata        000000000000000c kMyString
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Finally! It’s in &lt;code&gt;rodata&lt;/code&gt;, which is read-only, and the symbol directly labels
the string bytes, without any indirection through ELF relocations.&lt;/p&gt;

&lt;p&gt;Of course, when you use &lt;code&gt;kMyString&lt;/code&gt; in the program, it will insert uses of this
ELF symbol, and the linker/loader system will ensure it uses the eventual
runtime address of &lt;code&gt;kMyString&lt;/code&gt;, but this is true of &lt;em&gt;any&lt;/em&gt; global variable, and
is not unique to arrays at all. Arrays provide the most direct, least
foot-gun-prone global string constants.&lt;/p&gt;

&lt;p&gt;In summary:&lt;/p&gt;

&lt;pre&gt;&lt;code class="language-c"&gt;&lt;span class="c1"&gt;// Mutable pointer to immutable bytes.&lt;/span&gt;
&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="kt"&gt;char&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;kMyStringMut&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"hello world"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;// Immutable pointer to immutable bytes.&lt;/span&gt;
&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="kt"&gt;char&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="n"&gt;kMyStringIndirect&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"hello world"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;// Immutable bytes.&lt;/span&gt;
&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="n"&gt;kMyString&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"hello world"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
</content>
  </entry>
</feed>

