Systems, Security and Data Analysis (SYSDA) week 1 lecture and lab notes
Systems, Security, and Data Analysis (SYSDA)
Binary data
1's and 0's aka ones and offs
High and low
Eight bits make a byte 00010100
In turn 1Kbyte is 1024 bytes
Mega is 1048576 etc
Words-
units of data whose size are only fixed for specific processor types- think 32 or 64 bit as these are the size of words 32 bits (32-bit processor) or 64 bits (64-bit processor)
units of data whose size are only fixed for specific processor types- think 32 or 64 bit as these are the size of words 32 bits (32-bit processor) or 64 bits (64-bit processor)
Hexadecimal
16 digits are written:
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F
Each digit corresponds directly to a pattern of four
binary bits
Example time
01101101 split into two groups of 4 bits Is 0110 1101
Which in hexadecimal is 6D
Many tools display, think FTK
Numbers in hex
Numbers in hex
Note that an 8-bit byte can also be interpreted as a
number between 0 and 255 (decimal)
Note that
10001011010100111100 (binary)
= 570684 (decimal)
= 8B53C (hex)
Sometimes use “0x” to show something is a hex, for
example 0X8B53c
Endianness
This is the order that bytes are representing a longer
number are stored think little endian big endian FTK
A hex of 00 08 B5 3C would be shown differently
Little-endian computer: would show it backwards
Big endian stores it as 00 08 B5 3C
Strings and
character encoding
This is how textual information is represented in
computer bytes
Best known is ASCII which encodes each western letter
into a single byte
Unicode- a more recent one allows for many more alphabets,
but needs more than one byte per character
They try to enable backwards capability
Below is an ASCII table
Unicode
Assigns a unique sequence number to any character,
regardless of its alphabet
This is a numeric value between 0 and 10FFFF to the power
of 16
Planes are subdivided into blocks which have variable
size, each containing characters of one alphabet or a group of related
Unicode allocation
Below is an image of a basic multilingual plane
Note; sequence of Unicode code points is normally broken
down further into units that may be smaller than the code point
Size of the unit characterises the three encoding forms
UTF-32
Most straightforward is UTF-32 in which the units have
size 32 bits
UTF-16
breaks Unicode
characters into 16 bit units
Not large enough to represent all possible Unicode points
but large enough for most common alphabets
Splits the ones that are two large into two consecutive
units
UTF- 8
Breaks characters into 9 bit units
But variable as for example for European texts it will
normally use 8 or 16 bits per character
Software
Assembly language-
Used by system programmers
Two instructions in intel assembler
MOV EBX, EAX
Add EBX, 4
First one copies the contents of register EAX to register
EBX
Second increases value in register EBX by 4
Higher level
languages
Assembly language is a stepup from machine code, but not
by too much
C
Example of a system programming language
Through data types called pointers, C allows direct
access to physical memory addresses
Hello world in C
#include <stdio.h>
int main(int argc, char* argv []) {
printf("hello world\n");
}
From C to machine
code
Hello world is what’s generally called source code- it is
an example of source code
In compiled languages like C or C++ this source code is
converted directly from source code to machine code by another system program,
called compiler
Works as seen below
Further reading on this is available at:
n
Andrew S. Tanenbaum, “Modern Operating
Systems”, 4th Edition, Pearson, 2014 (MOS)
n
See section 1.8 on the C programming language.
n
Brian Carrier, “File System Forensic Analysis”,
2005 (FSFA)
n
Chapter 2.




Comments
Post a Comment