CITS3007 lab 3 (week 4) – Permissions and setuid programs – solutions
It’s recommended you complete this lab in pairs, if possible, and discuss your results with your partner.
Programs and commands in this lab are targeted at a Linux
environment. Ideally, this should be the standard CITS3007 development
environment, but they should work in any environment, based on a recent
Linux distribution, in which you can obtain root
privileges.
setuid
and Unix
permissionsThis section and the following ones introduce the Unix access control system. This includes:
setuid
permissions; and(You may want to refer back to this lab worksheet in future labs,
when we look at how setuid
programs can be exploited.)
The setuid
(“set user identity”) facility allows normal,
non-root users to run a program as if it were being run by
another user. This allows users to do things like change their password.
Hashes of
passwords are stored in the /etc/shadow
file,1
which is only readable and writeable by root
. (Run
ls -al /etc/shadow
to see the file’s permissions – what are
they?)
A non-root user can run the command passwd
to change
their password – so how does this work, if that user cannot read or
write the /etc/shadow
file?
The answer is that passwd
is a setuid
program, and when a setuid
program is run, it assumes the
privileges of the owner of the executable file, rather than (as
is usually the case) the privileges of the user executing the command.
Specifically, since the /usr/bin/passwd
executable is owned
by root
, that means that when a normal user runs that
executable, it will run with the root user’s permissions. The diagram
below depicts the interactions when a user (“alice
”)
invokes the passwd
program, and permissions of the two
files involved (/usr/bin/passwd
and
/etc/shadow
).
Let’s take a look at the permissions of the passwd
executable by running the following:
$ ls -al `which passwd`
You should see something like this:
-rwsr-xr-x 1 root root 68208 Mar 14 16:26 /usr/bin/passwd
The “s
” in the first 3 characters of the permissions
indicates that this program has the setuid
feature
enabled.
An example of a non-setuid
program, on the
other hand, is less
(and most other programs you commonly
run). We’ll confirm that if we try to use less
to read a
file that only root
can read, it won’t work. Try running
the following commands:
$ ls -al /etc/sudoers
$ less /etc/sudoers
The output of the ls
command should look something like
this:
-r--r----- 1 root root 798 Mar 16 15:22 /etc/sudoers
Like the /etc/shadow
file, the /etc/sudoers
file contains critical operating system information, and thus can’t be
read or written to by most users. (It is a configuration file used by
the sudo
command, which can be used to temporarily run
any command as root
.) In fact, as the file
permissions show, only the root
user can read the file, so
when you try to read it with less
, you get a permission
error.2
How to read the output of the ls
command
The man ls
documentation unfortunately doesn’t fully
explain the listing format used by ls
– see the
documentation of the GNU
coreutils package for a fuller explanation, or this
guide for a short tutorial.
In brief: the first group of 10 characters (starting with
“-r
”) indicates who can access the file, and how they can
access it. Every file on a Unix-like system has associated with it a set
of flags (binary options), and the last 9 characters in that
group show what they are. The meaning of the characters is as
follows:
The first character isn’t a flag – it indicates the type of file.
A directory is shown as “d
”, and a symbolic link as “l”.
Other file types are shown using other characters, as outlined in the
GNU coreutils documentation.
The remaining 9 characters should be read in groups of three. The first group indicates permissions available to the file owner, the second, permissions available to the file group, and the third, permissions available to everyone else.
If someone has read, write and execute permissions, the group of
three will be the characters “rwx
”’; if they only have a
subset of those permissions, some of the characters will be replaced by
a hyphen.
Thus, a file where the user has read, write and execute permissions,
but everyone else only has read permissions, will look like this:
-rwxr--r--
.
For some programs, the “x
” in the first group may be
replaced by “s
”, as we shall see – this tells the operating
system that it has the “setuid
” feature enabled, and when
run, its effective permissions are those of the owner of the
file (rather than of the user who started the process).
Programs that use features like setuid
are essential on
Unix systems, but also can be very dangerous: if they are not carefully
written, their elevated privileges can have unexpected consequences. For
instance, consider a normal program being run by a non-root user. The
program might contain a vulnerability which allows that user’s
confidential data to be read by a third party. This obviously isn’t
ideal, but is at least restricted to the user who invoked the program.
If a setuid
program contains a similar vulnerability,
however, then it could potentially allow all users’
confidential data to be read, because when run the program will have
root
’s permissions.
setuid
programsConsider the following commands:
"chsh"
"su"
"sudo"
What do they do? (Read the man
pages for the commands
for details.) How can you find out where the executable files for those
commands are located? (Hint: check out man which
.) List the
permissions for the executables using ls -al
. What do they
have in common? Can you explain, in each case, why those permissions
would be needed?
Sample solutions
The commands all have their setuid
bit set. This is
needed because:
chsh
, it changes a user’s login shell, and that
information is stored in /etc/passwd
, which is only
writeable by root. (In other words – the reason is much the same as for
passwd
.)or,
su
and sudo
: they allow commands to be
run with root
permissions.For each of these commands, copy the executable to your working
directory using the cp
command, and run ls -al
on the copy. What permissions does the copy have? Try running the copied
sudo
and chsh
commands (give chsh
the password vagrant
, and suggest /bin/sh
as a
new shell). What happens?
You can add the setuid
feature to the copied programs by
running the command chmod u+s ./SOME_PROGRAM
(inserting the
name of the copied program as appropriate). However, the programs will
still be owned by you – what happens if you try to run them?
Sample solutions
If you copy one of the setuid
programs, the copy no
longer has its setuid
bit set. By default, the
cp
command assumes we wish to copy the file
content, but not file properties like whether the
setuid
bit is set.3
Even if you do set the setuid
bit on the files,
using chmod
, though – due to the way they have been coded,
none of the programs will function properly without
both (a) having the setuid
bit set, and
(b) being owned by root
.
This is a safety precaution built into the programs: if the programs detect they are being used in a way that isn’t intended, they abort execution.
When a program needs to have the setuid
feature enabled,
it is best practice to use the lowest possible level of elevated
privilege, and to use the elevated privilege level for as short a time
as possible. Once the privileges have been used for whatever purpose
they were needed for, the program should relinquish them – it does so by
calling the setuid()
function (or one of a number of
closely related functions4). You can read about the
setuid()
function at man 2 setuid
– but its
use is complicated, and we will elaborate more on it in the next
section.
Relinquishing privileges is especially important for long-running
processes. For instance, a web server might require higher permissions
than normal early in the life of the process (e.g. to read configuration
files, or listen for connections on port 80, which normally only
root
can do) but should drop those
permissions once they are no longer needed – otherwise the potential
exists for them to be exploited.
Using as few privileges as are needed, for as short a time as is needed, is known as the “Principle of Least Privilege”: read more about it in the Unix Secure Programming HOWTO on minimizing privileges.
Suppose we are creating a set of programs which manipulate a human
resources database. The database file is located at
/var/hr/hr.db
and is owned by root. One of our programs is
called hr_db_amend
, and is a setuid program also owned by
root.
Is this in line with good security practice, as explained in the Unix Secure Programming HOWTO? If not, what should we do instead?
Sample solution
This is not in line with best practice.
Our hr_db_amend
program will run as root
,
when it could be given far fewer privileges, which would lessen the
impact if we happen to make a mistake when coding it.
A better approach would be, for instance, to create a user called
hr
, and have that user ID own the database file
/var/hr/hr.db
and the hr_db_amend
executable.
We could still make our program setuid; but now it would run as user
hr
, giving it access to the HR database, but not to
important system configuration files like /etc/passwd
and
/etc/shadow
.
An example of a Unix program which creates specialized user accounts, just for the purpose of reading or writing files owned by those accounts, is crontab, part of the “cron” software package.
Crontab allows users to create recurring “tasks” which will be run
periodically on a Unix system. The files defining these tasks are stored
in the directory /var/spool/cron/crontabs
. The Crontab
program needs to meet the following requirements:
/var/spool/cron/crontabs
directory; BUT/var/spool/cron/crontabs
world-readable and writable.The solution is to make the /var/spool/cron/crontabs
directory group-owned by a group called crontab
, and for
the /usr/bin/crontab
program to be a setgid
program and be also group-owned by crontab
.5
(setgid
is like setuid
, but it means a program
runs with different group permissions, instead of different
user permissions.)
By listing the details of /var/spool/cron/crontabs
and
/usr/bin/crontab
, you should be able to confirm that this
is the case.
Because of the security vulnerabilities often introduced by
setuid
and setgid
programs, a blog post by
Konstantin Ivanov here
suggests removing (or partly disabling) many of the less commonly used
setuid
and setgid
programs – they introduce
potential vulnerabilities for little advantage.
Unix systems keep track of two facts about a running process – the process’s “real user ID” (that is, the user ID of the user who created the process), and its “effective user ID” (which represents the permissions the process is currently acting with, and might be different to the real user ID).
For most programs, the two are exactly the same. For
setuid
programs, however, the operating system sets the
real user ID (abbreviated “rUID”) as per usual, but sets the effective
user ID (abbreviated “eUID”) to the owner of the executable file being
run. (You might think of the “real” user as being the user who actually
invoked the command, and the “effective” user as being the user whose
privileges are currently being used.)
The Linux C functions getuid
and geteuid
are used to determine at runtime what the
rUID and eUID are. This is how a program like su
can tell
whether it is being run as a setuid, root-owned program – it’s eUID
should be 0. (Linux also has something called a “saved
user ID”, used for the situation where a program mostly
needs to run as root, but temporarily needs to run some actions as a
non-privileged user. However, we will leave discussion of it until a
later lab.)
Compile the following program:
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <unistd.h>
int main() {
uid_t ruid = getuid();
uid_t euid = geteuid();
printf("real uid is: %d\n", ruid);
printf("effective uid is: %d\n", euid);
return EXIT_SUCCESS;
}
Compare the results it gives when run with the results it gives after you change its owner to root and make it a setuid program, and make sure you understand why you see the results you do.
So, we know that we should use elevated privileges for as short a time as possible, and then relinquish them. How do we do that?
The Linux C function seteuid
6 lets us change our eUID; to drop
privileges, we just change our eUID so that it’s the same as our
rUID.
However, there are some pitfalls to be aware of when relinquishing privileges: read the Software Engineering Institute’s web page on relinquishing permissions for an explanation of some of the issues.
One of the most common pitfalls is not checking the return value
from setuid
-related functions that can fail. Not
checking the return values of functions that can fail is bad practice in
any C program, but is even more severe for setuid
programs – because if we’re not checking return values, then we don’t
know what privileges our program is currently operating with, and could
easily do something both dangerous and unintended.
Can the function getuid()
fail? What about
geteuid()
? What about setresuid()
?
Sample solutions
The getuid()
and geteuid()
functions cannot
fail, but setresuid()
can.
This is documented in the manual pages for those functions.
In our hr_db_amend
program described above, we initially run with elevated
privileges.
Most users on the system cannot read or write the
/var/hr/hr.db
file, but because we are running as the
hr
user, we can.
Our hr_db_amend
program contains the following code,
executed shortly after the program has started running:
// open DB file for reading and writing
fd = open("/var/hr/hr.db", O_RDWR);
if (fd == -1) {
printf("Cannot open /var/hr/hr.db\n");
exit(EXIT_FAILURE);
}
// now that we have a file descriptor,
// privileges are no longer needed - relinquish them
setuid(getuid()); // getuid() returns the real uid
Does this code contain potential security flaws? If so, how could it be changed to remove them?
Sample solution
It does contain security flaws. The setuid
function (or
seteuid
, a variant of it found on many Unix-like operating
systems) can fail, and if it fails, then we haven’t successfully
relinquished privileges. In case of a failure, we must immediately
abort execution – otherwise, we’ll be performing actions which were
intended to be performed by a non-privileged user, but will be
performing them as root. This is highly prone to being exploited by
attackers.
We should check the return value from setuid
, and if it
is non-zero, we should abort.
(In fact, we should always check the return value from any function call that can fail. All our subsequent code has presumably been written on the assumption that the function call succeeded; if it didn’t, then our assumption is wrong, and we have no idea what the actual current state of the system is.)
We will experiment further with setuid
programs, and
code for using and relinquishing privileges, in future labs.
The following questions and exercises are intended to improve your understanding of filesystems and access control on Unix-like systems. You may be able to answer them based on what we have covered in class (or on your background knowledge of Unix systems), or you might need to experiment or conduct some research to work out the answer.
Is it possible to create a symbolic link to a symbolic link? Why or why not?
Solution
Yes, it’s possible to create a symbolic link (“symlink”, for short)
to a symbolic link in Unix-like operating systems. You can verify this
just by trying it out: create a file (say, “somefile.txt
”),
create a symlink to it (for instance, with the command
ln -s somefile.txt mylink
), and then create a symlink to
that symlink.
Creating a symlink to a symlink just adds an extra level of indirection.
Suppose file_b
is a symbolic link to
file_a
. What effect will setting the permissions of
file_b
have on whether users can read or write the
contents? What about changing the owner or group-owner of
file_b
?
Solution
You can experiment by creating a file yourself (e.g. by running the
command “echo 'foo' > file_a
”), creating a symlink to
that file (e.g. by running the command
“ln -s file_a file_b
”), and then listing and attempting to
alter the permissions of file_b
.
If you try this, and display the permissions of file_b
(by, for instance, running ls -al
), you should see an entry
like the following:
lrwxrwxrwx 1 vagrant vagrant 6 Mar 11 01:54 file_b -> file_a
The symlink is listed as granting all permissions to all
users; and if you try changing the permissions, you’ll see that what you
end up doing is actually changing the permissions of the target
file (file_a
).
The reason for this is that symbolic links are treated specially by
the operating system – the permissions lrwxrwxrwx
are the
only permissions a symlink can ever have.
We can change the owner and group-owner of a symbolic link, distinct from its target, but (a) it’s a little tricky to do so, and (b) it will have no effect on which users can read or write from the target file.
If we trying invoking the chown
command on a symlink,
then by default, it will change the ownership of the target
file, not the link. This is explained in the man page for the
chown
command, man chown
,
which says that we can pass the options --dereference
and
--no-dereference
to chown
. The first option is
also the default behaviour, and means that we’ll actually be changing
ownership of the target. The second option,
--no-dereference
, is needed if we want to change ownership
of the symlink itself.
However, if you do so (e.g. by running
“sudo chown --no-dereference root:root file_b
”), you’ll
discover that it makes no difference to who can read or write the file.
That’s determined by the owner (and group) of the target file.
What type of filesystem is used for the root partition of the CITS3007 standard development environment (SDE)? How can we find out?
Solution
One way of finding out the filesystem type is to use the
df
command (see man df
), which is
normally used for showing disk space usage. If you pass the options
“-T
” or “--print-type
” to df
, it
will show not just how much space is in use on each file system, but
what the filesystem type is for each of them.
Running the command “df -hT
” within the CITS3007 SDE
will produce output similar to the following:
vagrant@cits3007-ubuntu2004:~$ df -hT
Filesystem Type Size Used Avail Use% Mounted on
udev devtmpfs 193M 0 193M 0% /dev
tmpfs tmpfs 48M 940K 47M 2% /run
/dev/vda3 ext4 124G 5.1G 112G 5% /
tmpfs tmpfs 237M 0 237M 0% /dev/shm
tmpfs tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs tmpfs 237M 0 237M 0% /sys/fs/cgroup
/dev/vda1 ext4 456M 202M 220M 48% /boot
tmpfs tmpfs 48M 0 48M 0% /run/user/1000
The filesystem type for the root partion “/
” is shown to
be “ext4
” – this is a member of the “EXT”
(“EXTended File System”) family of filesystems which
are common on Linux systems.
There are many other ways for displaying a filesystem’s type besides
using df
. If we know the device file that represents the
root partition (/dev/vda3
, from the listing above), we can
show its filesystem type using the blkid
command
(“locate/print block device attributes”) – see man blkid
.
Running blkid
on the CITS3007 SDE will produce output like
the following:
vagrant@cits3007-ubuntu2004:~$ blkid
/dev/vda1: UUID="a69580c8-ef0b-49b8-b6de-9e4d72d3ea10" TYPE="ext4" PARTUUID="a1614753-01"
/dev/vda2: UUID="41ca73ed-c076-41a1-88c4-9f3f6b2233ac" TYPE="swap" PARTUUID="a1614753-02"
/dev/vda3: UUID="61ca90f5-3323-4d55-a8ee-c1dd2a72f267" TYPE="ext4" PARTUUID="a1614753-03"
Running the mount
command will also display the
filesystem type of all mounted filesystems.
Does the Windows operating system support symbolic links? Is it possible to create a symbolic link on a USB “thumb drive” usable by Windows?
Solution
Modern versions of Windows do support symbolic links (see the Wikipedia page on “Symbolic link”).
To use symbolic links, they have to be supported by both the filesystem, and the operating system – whether you can create a symbolic link on a USB thumb drive would depend on what filesystem the drive has been formatted with, and what operating system (and version) you are accessing it with.
Many thumb drives use a variants on a simple filesystem called FAT (short for “File Allocation Table”), originally developed in 1977. FAT is also sometimes used for the boot partition of desktop computers. FAT does not support symbolic links, so typically, you can’t create a symlink on a thumb drive, even if you have it mounted to a Unix operating system (you’ll typically get an “Operation not permitted” error, if you try it). Some thumb drives instead use a Microsoft-designed filesystem called NTFS, modern versions of which do support symbolic links.
The Windows OS has supported symbolic links since version 6.0 (Windows Vista, released in 2007).
Can you find out: how does your computer store passwords? Does it use a hashing algorithm, and if so, which one? Where on disk are passwords (or their hashes) stored?
Solution
The exact answer will depend on your operating system and version. If it is Linux, we have covered the answer already.
On Windows, passwords are stored as part of the registry, in a database file called the Security Account Manager, and the exact algorithm used for hashing varies from version to version of Windows – at the time of writing, an NTLM hash is typical.
On MacOS, the location of password hashes varies from version to
version, but
as of version 10.7, they are in
/var/db/dslocal/nodes/Default/users/${USER}.plist
(where
USER
is a user ID), and the hash function used is
PBKDF2-HMAC-SHA512. You can read more about how MacOS password hashes
can be encrypted and broken here.
(Note that if experimenting with password cracking, you
must only do so with your own passwords and on your own computer – it is
illegal and unethical to use other people’s data or computers without
permission.)
On Moodle, under the section “Week 4 – string handling”, is a set of exercises on using C’s string handling functions to write a safe “path-construction” function.
Complete the exercises, using your development environment to help test your code. If you aren’t able to complete the exercises during your scheduled lab time, you should work through them in your own time.
Sample solution
The following is a sample solution for the
would_wrap_around()
function:
/** Return true when the sum of dirlen, filelen, extlen and 3
* would exceed the maximum possible value of a size_t; otherwise,
* return false.
*/
bool would_wrap_around(size_t dirlen, size_t filelen, size_t extlen) {
size_t res = dirlen;
res += filelen;
if (res < dirlen || res < filelen)
return true;
size_t old_res = res;
res += extlen;
if (res < old_res || res < extlen)
return true;
old_res = res;
res += 3;
if (res < old_res || res < extlen)
return true;
return false;
}
Comments on the code:
dirlen + filelen
is less than
dirlen
or less than filelen
. But actually, it
suffices to see if the result is less than dirlen
alone.
(Can you see why?)The following is a sample solution for the
make_pathname()
function:
char* make_pathname(const char *dir, const char *fname, const char *ext) {
size_t dirlen = strlen(dir);
size_t filelen = strlen(fname);
size_t extlen = strlen(ext);
if (would_wrap_around(dirlen, filelen, extlen)
return NULL;
// include terminating NUL
size_t pathbuf_size = dirlen + filelen + extlen + 3;
char *path = malloc (pathbuf_size);
if (!path)
return NULL;
memcpy (path, dir, dirlen);
path[dirlen] = '/';
memcpy (path + dirlen + 1, fname, filelen);
path[dirlen + filelen + 1] = '.';
memcpy(path + dirlen + filelen + 2, ext, extlen + 1);
return path;
}
Some comments on the code:
When working on a C project, it can be helpful if your
development team has a convention for distinguishing lengths of
strings (which don’t include in their count the terminating
NUL
) from sizes of buffers (which always
should include in their count any required NUL
characters).
This can help make code reviews easier, and ensure mistakes aren’t made by using one sort of length where the other would be appropriate. (Even better would be to configure your team’s static analysis programs to distinguish the two, and enforce rules about how each is used.)
The code above uses names like dirlen
,
filelen
etc for string lengths, but
pathbuf_size
for a buffer.
When you submit C code as part of an assessment, your code will be assessed not only for correctness, but for style and clarity.
This means making appropriate use of C library functions. The following are examples of solutions which don’t make appropriate use of C library functions:
Solutions which don’t make use of C library functions at all –
for instance, using a for
loop instead of
memcpy
to copy characters from the arguments to
path
:
for(size_t i = 0; i < dirlen; i++){
[i] = dir[i];
path}
Writing a for
loop makes the solution needlessly
complex, harder to read than necessary, and more prone to error. If a
known number of bytes need to be copied from one location in memory to
another, then memcpy
is the standard C idiom for doing
so.
Solutions which use strcat
to build up the
path
instead of memcpy
.
For instance:
(path, dir);
strcpy(path, "/");
strcat(path, fname);
strcat(path, ".");
strcat(path, ext); strcat
Although less problematic than the code in point (a), this is
needlessly inefficient. Each call to strcat
will result in
path
being iterated over again.
Solutions which use snprintf
to write into
path
– these are creating additional points of failure in
the function. They are discussed further below.
Calling make_pathname
We could write a main
routine which allows us to invoke
make_pathname
from the command-line, as follows:
int main(int argc, char **argv) {
// skip program name
argc--;
argv++;
if (argc != 3) {
fprintf(stderr, "Expected 3 args\n");
exit(EXIT_FAILURE);
}
// See note: poor validation!
char* dir = argv[0];
char* fname = argv[1];
char* ext = argv[2];
char* path = make_pathname(dir, fname, ext);
if (path != NULL) {
printf("path is: '%s'\n", path);
free(path);
}
else
printf("couldn't allocate enough memory\n");
return EXIT_SUCCESS;
}
Some comments on the code:
In lines 12–14, our program is taking input from a potentially untrusted user via the elements of the argv array.
We can be sure that those elements are valid null-terminated
strings – they aren’t NULL
pointers, nor do they run off
past the end of the memory segment – because the language standard
assures us so.
But we can’t be sure they satisfy the other preconditions of
make_pathname
(e.g. the file and extension not containing
slashes or dots).
So that will mean that when we call make_pathname
, we
could be violating its preconditions, and the returned string might not
be what we expect and might not be safe to use.
Furthermore, if we pass that string on to other functions, we can render ourselves vulnerable to a path traversal attack (see https://owasp.org/www-community/attacks/Path_Traversal).
For instance, suppose the dir
argument is known to be a
(system-controlled, trusted) path to a directory where web pages are
stored (“/path/to/webdir
”, say), and the fname
and ext
arguments are taken from an (untrusted) browser
request and specify an HTML file to serve up.
If we haven’t validated the contents of the fname
and
ext
arguments, an attacker could set them to
"."
and "/"
, respectively. Then the result of
make_pathname
would be “/path/to/webdir/../
”,
which represents the parent directory of
webdir
.
We have thus potentially given the attacker the ability to read files from outside the directory they should have access to, which could include configuration files and other sensitive data.
We will look in more detail at input validation and path traversal attacks in future labs.
An alternative implementation of make_pathname
:
char* make_pathname(const char* dir, const char* fname, const char* ext) {
if (would_wrap_around(dirlen, filelen, extlen)
return NULL;
size_t totalLength = strlen(dir) + strlen(fname) + strlen(ext) + 3;
// +3 for '/', '.', and '\0'
char* result = malloc(totalLength);
if (result == NULL)
return NULL;
int written = snprintf(result, totalLength, "%s/%s.%s", dir, fname, ext);
if (written != totalLength - 1) {
free(result);
return NULL; // error occurred
}
return result;
}
Comments on this implementation:
This implementation uses snprintf
– many C
programmers would regard memcpy
and array-element
assignment as the simpler solution, though, because when called
correctly, memcpy
can’t fail. (memcpy
is basically just a for
loop which copies from destination
to source.)
snprintf
on the other hand is a complicated function.
There’s no obvious reason why it should fail, if our logic is
correct; but nevertheless, we can only be certain that it worked if the
number of bytes written was exactly equal to what we expected
(totalLength - 1
). So we’ve added in an additional
possibility for failure that wasn’t in the original specification – the
spec said we would only return NULL
if it wasn’t
possible to allocate enough memory, but now we’re opening up the
possibility for other causes of error.
(Technically, if we’re failing for some reason other than “can’t
allocate enough memory”, we should probably abort()
rather
than return NULL
, because we can’t fulfil the promise we
made in the function specification.)
(Challenge questions in the lab worksheets are aimed at students who already have a good knowledge of C and operating systems – they are not compulsory to complete.)
On Linux, is it possible to write a setuid program in Python? Why or why not? Is it advisable?
Sample solution
In current versions of Linux (since at least kernel version 3.0), it’s not (straightforwardly) possible for a Python program to be setuid. You can try it by writing the following Python program, and making it owned by root, executable, and with its setuid bit enabled:
When run, this script will simply print out your normal user ID, not 0.
Why is this? One reason is that the file is a script, and isn’t being executed in the same way as a binary executable. When asked to execute a program, the kernel inspects the start of a file to determine what sort of program it is being asked to run.7
The first 4 bytes of a binary executable on Linux will be
"\0x7fELF"
, indicating that it’s an ELF-format
executable, but scripts will start with the characters
"#!"
, indicating that they are intended to be supplied to
an interpreter to be run – in the case of Python scripts, the
interpreter will end up being /usr/bin/python3
.
So for Python scripts, the machine-code instructions which are loaded
into memory and executed by the processor don’t come from the script
(they couldn’t – the script contains plain text, not machine code), but
from the Python interpreter binary executable at
/usr/bin/python3
. That file doesn’t have its setuid bit
enabled, so it doesn’t run as root. (The script file is just a data file
which the Python interpreter opens, reads, and interprets as Python
instructions.)
You could deliberately set the Python interpreter to be a setuid program, if you wanted; but that would be very unwise, as now all Python programs would run as root.
Is there any other way one could run a Python script as a setuid
program? Yes, there is: you could create a custom interpreter designed
to run Python scripts that have their setuid bit enabled. Your
interpreter would need to be owned by root and be a setuid program
itself, and scripts intended to be run by it would start with
#!/path/to/my/custom-interpreter
.
The custom interpreter would need to
seteuid
to assume the privileges of the owner of the
script.execve
system call (see
man execve
) to start the normal interpreter
(e.g. /usr/bin/python3
, for Python scripts) running, with
the script supplied to it as an argument.The custom interpreter would also need to be carefully coded to avoid the possibility of injection attacks and a wide range of other problems. Examples of custom interpreters which aim to do this can be found here and here. (Note: the safety of the code in these custom interpreters has not been vetted – use of it is at your own risk.)
Would a custom interpreter like this be a good idea? It’s doubtful. Programs like the Python interpreter are very large and complex, were not designed with running at elevated privileges in mind, and contain innumerable places in their code where security vulnerabilities could lurk.
Programs written in C have the disadvantage of not being memory-safe;8 but (if crafted carefully) they have the advantage that they can be kept extremely small and simple, with a very limited number of locations which need to be reviewed for security issues.
The issues raised by setuid
scripts are discussed more
in this
StackExchange answer.
To be more precise, what’s stored in
/etc/shadow
is a hashed
and salted version of the password.↩︎
If you look at the listing of /etc/sudoers
carefully, you may notice not even root
has write permissions to write to the /etc/sudoers
file! So
you might ask: how can can the file ever be modified? The answer is that
on Unix systems, whenever the access control system asks “Can user X
perform action Y?” – if the user is root
, the answer is
always “yes”, no matter what the action is.
In other words, root
can basically do anything, no
matter what the file permissions say. Try this: create a root-owned file
by running sudo touch myfile
, then
sudo chmod 000 myfile
, then ls -al myfile
.
You should see that no user can read or write myfile
.
But then run the command
echo hello | sudo tee -a myfile
. (We use
tee -a
to append text to the file.)
If you inspect the contents of myfile
(e.g. by using
sudo less
), you’ll see we did successfully append
text to it, even though root
didn’t have write permissions
to the file, and we were able to read from it, even though
root
didn’t have write permissions. So in that case – why
ever set specific permissions for root
-owned files at all?
The answer is that we could just leave those permissions “blank” if we
wanted – it’s just convention to set them up as we do.↩︎
The logic of the cp
command (at least, the
GNU implementation of the command which we are using – see here
for the source code) is not always very intuitive.
As noted above, by default, cp
doesn’t copy file
metadata at all. If you add the --preserve
flag, it’ll
usually copy the file metadata (including the
setuid
bit) – but if the source file is owned by a
different user from you, it won’t copy the setuid
bit. (It
is possible to persuade the cp
command to do so,
though it doesn’t seem to be well-documented – as a challenge you might
like to find out how.)
As a result, for purposes like backing up a filesystem, we
often instead use programs with better-documented and more intuitive
behaviour, like tar and rsync.↩︎
For more information about setuid
-related
functions, you should read Chen et al, “Setuid
demystified” (PDF; 11th USENIX Security Symposium, 2002), which is
part of the recommended readings for week 3.
Chen et al point out that Linux has a whole gamut of
setuid
-related functions – setuid()
,
seteuid()
, setreuid()
,
setresuid()
, and more – and it’s not obvious which one
should be used. The authors point out that one of those
functions has much clearer semantics and documentation than the others –
read the paper to find out which.↩︎
The version of cron being used in the CITS3007 SDE is
called “Vixie cron”, and is
widely used on Unix systems. The use of setgid
isn’t part
of the original “Vixie cron” source code, however – it was introduced as
an improvement to the original Vixie cron in the Debian
distribution of Linux in 2003.↩︎
Or one of the other setuid
-related
functions, discussed in Chen et al, “Setuid
demystified” (PDF; 11th USENIX Security Symposium, 2002).↩︎
This is done by the function search_binary_handler
in the kernel. You can find more discussion of the process here, here,
here
and here.↩︎
A number of systems programming languages have been
developed which have better memory safety than C – for instance, Cyclone, ATS, and Rust. As yet, however, none of them
have completely supplanted the role of C and C++ in systems programming.
(If you have an interest in memory-safe systems programming languages,
then this blog
post by Jonathan Goodwin examining the influence of Cyclone on later
languages, including Rust, may be of interest.)↩︎