Systems, Security and Data Analysis (SYSDA) week 1 lecture and lab notes


Systems, Security, and Data Analysis (SYSDA)

                        

Binary data

1's and 0's aka ones and offs
High and low
Eight bits make a byte 00010100
In turn 1Kbyte is 1024 bytes
Mega is 1048576 etc

Words-
units of data whose size are only fixed for specific processor types- think 32 or 64 bit as these are the size of words 32 bits (32-bit processor) or 64 bits (64-bit processor)


Hexadecimal

Also known as hex or “base16”

16 digits are written:
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F
Each digit corresponds directly to a pattern of four binary bits


Example time
01101101 split into two groups of 4 bits Is 0110 1101
Which in hexadecimal is 6D

Many tools display, think FTK

Numbers in hex
Note that an 8-bit byte can also be interpreted as a number between 0 and 255 (decimal)

Note that
10001011010100111100 (binary)
    = 570684 (decimal)
    = 8B53C (hex)

Sometimes use “0x” to show something is a hex, for example 0X8B53c


Endianness

This is the order that bytes are representing a longer number are stored think little endian big endian FTK

A hex of 00 08 B5 3C would be shown differently
Little-endian computer: would show it backwards
Big endian stores it as 00 08 B5 3C


Strings and character encoding
This is how textual information is represented in computer bytes
Best known is ASCII which encodes each western letter into a single byte
Unicode- a more recent one allows for many more alphabets, but needs more than one byte per character

They try to enable backwards capability

Below is an ASCII table

Unicode
Assigns a unique sequence number to any character, regardless of its alphabet
This is a numeric value between 0 and 10FFFF to the power of 16

Planes are subdivided into blocks which have variable size, each containing characters of one alphabet or a group of related

Unicode allocation

Below is an image of a basic multilingual plane


Note; sequence of Unicode code points is normally broken down further into units that may be smaller than the code point
Size of the unit characterises the three encoding forms

UTF-32
Most straightforward is UTF-32 in which the units have size 32 bits

UTF-16
 breaks Unicode characters into 16 bit units
Not large enough to represent all possible Unicode points but large enough for most common alphabets

Splits the ones that are two large into two consecutive units

UTF- 8
Breaks characters into 9 bit units
But variable as for example for European texts it will normally use 8 or 16 bits per character


Software
Assembly language-
Used by system programmers

Two instructions in intel assembler
MOV EBX, EAX
Add EBX, 4

First one copies the contents of register EAX to register EBX
Second increases value in register EBX by 4

Higher level languages
Assembly language is a stepup from machine code, but not by too much

C
Example of a system programming language
Through data types called pointers, C allows direct access to physical memory addresses

Hello world in C
#include <stdio.h>
int main(int argc, char* argv []) {
     printf("hello world\n");
}

From C to machine code

Hello world is what’s generally called source code- it is an example of source code
In compiled languages like C or C++ this source code is converted directly from source code to machine code by another system program, called compiler

Works as seen below



Further reading on this is available at:

n  Andrew S. Tanenbaum, “Modern Operating Systems”, 4th Edition, Pearson, 2014 (MOS)
n  See section 1.8 on the C programming language.
n  Brian Carrier, “File System Forensic Analysis”, 2005 (FSFA)
n  Chapter 2.











Comments