\documentclass[a4paper,10pt]{article}

\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\usepackage[top=1.5in,bottom=1.5in,text=6.375 true in]{geometry}
\usepackage{amsmath}
\usepackage{url}
\usepackage{color}
\usepackage{ulem}
\usepackage{program}
\usepackage{graphicx}
\usepackage{algorithmicx}
\usepackage{algpseudocode}

\title{\textbf{Algorithms and Datastructures\\Assignment 1}}
\author{Magnus Aspling (880414-0039)\\Elina Comstedt (880914-0026)}
\date{November 30, 2011\\\hspace{1cm}\\\textbf{Revision 2}\\\today}
\setlength{\parindent}{0pt}
\setlength{\parskip}{10pt}

\begin{document}
\maketitle
\newpage
\tableofcontents
\newpage
\section{Introduction}

\subsection{The Algorithm from the textbook: SELECT}
You will send a list of integers and a number \textbf{i} into SELECT. The list should have \textbf{n} integers and the number should satisfy \textbf{i}>0. The algorithm will go through the list in five steps and then return the value of the \textbf{i}-th smallest place of the list.
\begin{enumerate}
\item Divide the list into groups of five elements, the last group may have less than five depending on the length of the list.
\item Sort the smaller groups internally using a sorting algorithm and then pick out the median.
\item Now take all the medians and puts them into a new list. Call SELECT again recursivly and send in the new list, ask for the median $(floor(list length / 2))$. When returning, you will have the ''median of medians''. 
\item Take the median back to the sorted groups of five and sorts them using the their median against the ''median of medians'', called x. K is the number of objects to the left and under the median, the median also counts in there. So all the other objects are n-k. 
\item Finding i:
	\begin{itemize}
	\item 	If i is equal to k, return x. Then the i-th smallest element is x. 
	\item   If i is smaller than k, call SELECT with k as the new list and run through all the steps again. 
	\item 	If i is larger than k, call SELECT with n-k as the new list. 
	\end{itemize}
\end{enumerate}

\subsection{Our take on the Algorithm: SELECT2}
Do note that steps 1-4 is essentially the same as SELECT. But that we choose a range rather then iterating over choosing a single number.

Input: Send in a list of integers \textbf{n}, a starting value \textbf{R1} and stopping value \textbf{R2}. The algorithm should return a sorted list between the \textbf{R1}-th smallest number and the \textbf{R2}-th smallest number.
\begin{enumerate}
\item Divide the list into groups of five elements, the last group may have less than five depending on the length of the list.
\item Sort the smaller groups internally using a sorting algorithm (selection sort on a list of 5 is a constant of 6 comparisons, therefore the total cost of this step is $6*(n/5)=\Theta (n)$) and then pick out the median.
\item Now take all the medians and puts them into a new list. Call SELECT again recursivly and send in the new list, the starting value and the stopping value should both be  $(floor(list length / 2))$ (the median). When returning, you will have the ''median of medians''. 
\item Take the median back to the sorted groups of five and sorts them using the their median against the ''median of medians'', called x. K is the number of objects to the left and under the median, the median also counts in there. So all the other objects are n-k. The list of numbers smaller or equal to the median is hereby named \textbf{LS (Left Side)} and the opposite \textbf{RS (Right Side)}
\item Finding R2:
	\begin{itemize}
	\item If R2 is equal to k then the median must be the rightmost number we would need, hence RS is discarded. 
	\item If R2 is larger then the median, SELECT2 is called with RS, with the starting value of 1 and stopping value of $R2-length(LS)$. The returned list is now the new RS.
	\item If R2 is smaller then the median, SELECT2 is called with LS, the starting value of 1 and the stopping value of R2. The returned list is now the new LS.
	\end{itemize} 
\item Finding R1: 
	\begin{itemize}
	\item If R1 is equal to k then the median must be the leftmost number we would need, hence every number in LS except for the median is discarded. 
	\item If R1 is larger then the median, SELECT2 is called with RS, with the starting value and stopping value of $R1-length(LS)$, the returned list is the new LS. 
	\item If R1 is smaller then the median, SELECT2 is called with LS, with the starting value R1 and stopping value $length(LS)$, the returned list is the new LS
	\end{itemize}
\item Merge the two list, and you have a list with the R1-th smallest element to the R2-th. Due to the nature of recursion in this algorithms, we can garantee this list is sequencial.
\end{enumerate}

\section{Implementation and analysis}
\subsection{Pseudocode}
\begin{algorithmic}[1]
\Procedure{SELECT2}{$lst,R1,R2$} \Comment{Our Selection Algorithm}
	\State //Sort Into 5-groups
	\State numGroups := lst/5 \Comment{Number of 5-groups}
	\For{$i:=0$ to $numGroups-1$}
		\If{$i*5+5<length(lst)$}
			selectionsort(lst, i*5 to i*5+5) \Comment{Sort group of five.}
		\Else
			selectionsort(lst, i*5 to end) \Comment{Sort group of less then five.}
		\EndIf
	\EndFor
	\State //Create Median List
	\State medianList := emptylist \Comment{Prepare list of Medians}
	\For{$i:=0$ to $numGroups-1$}
		\If{$i*5+5<length(lst)$}
			append(medianList, i*5+2) \Comment{Appends median to list}
		\Else
			append(medianList, i*5+((length(lst)-i*5)/2) \Comment{Appends median of last list}
		\EndIf
	\EndFor
	\State //Get Median of Medians
	\If {length(medianList) = 1}
		median := medianList[0] \Comment{Only one item, only one choice.}
	\Else
		median := SELECT2(medianList,length(medianList)/2,length(medianList)/2)[0]
	\EndIf
	\State // Create Partition Lists
	\State LS := emptylist
	\State RS := emptylist
	\ForAll{groups}
		\If {median > groupMedian}
			LS := [:groupMedian], RS := [groupMedian:]
		\ElsIf {median = groupMedian}
			LS := [:groupMedian+1], RS := [groupMedian+1:]
		\ElsIf {median < groupMedian}
			LS := [:]
		\EndIf
	\EndFor

	\State // Find R2
	\If {R2 = median}
		RS := emptylist
	\ElsIf {R2 > median}
		RS := SELECT2(RS,1,R2-length(LS))
	\ElsIf {R2 < median}
		LS := SELECT2(LS, 1, R2)
	\EndIf

	\State // Find R1
	\If {R1 = median}
		LS := median
	\ElsIf {R1 > median}
		LS := SELECT2(RS, R1-length(LS), R1-length(LS))
	\ElsIf {R1 < median}
		LS := SELECT2(LS, R1, length(LS))
	\EndIf
	\State // Merge Lists
	\State ReturnList := LS + RS
	\State return ReturnList
\EndProcedure
\end{algorithmic}
\subsection{Analysis}
\subsubsection{Running Cost}
	\begin{enumerate}
	\item Divide into groups $\rightarrow \Theta(n)$
	\item Sort the groups internally $\rightarrow \Theta(n)$
	\item Find median of medians $\rightarrow T(n/5)$
	\item Partition by median $\rightarrow \Theta(n)$
	\item Find R2 $\rightarrow T(3n/4)$
	\item Find R1 $\rightarrow T(3n/4)$
	\item Merge $\rightarrow \Theta(1)$
	\end{enumerate}
	\textbf{Summary:} Total Cost $\Theta(n)$
\subsubsection{Partition Size}
	Our hypotheis is that the partition size of 5 is the optimal, since the number of comparisons for a partition of 5 is low (6 with selection sort). 
	
	A smaller number would not benifit signifiantly when looking at sorting, but would raise the amount of medians, leading to a slower operation. 
	
	A higher partition would help if the sorting is particularly fast (as in newer computers). But all in all, 5 is a good number if you consiter running the algorithm on slower computers.
\subsubsection{Testing the hypothesis}
\textbf{SELECT} was tested with a randomized list of \textbf{1000000} items, picking out the \textbf{30th} to \textbf{35th} smallest element in the list, through partition sizes 1 to 10. Please note that the list was randomized at the begining and then static throughout the rest of the experiment (so that a fair comparison could be made). Each partition size was tested \textbf{5 times} and the \textbf{mean value} was then calculated.
\begin{center}
\begin{tabular}{|c|c|}
\hline
\textbf{Partition Size} & \textbf{Time (Seconds)} \\\hline
1 & infinity \\
2 & 308.2889 \\
3 & 129.9471 \\
4 & 129.8778 \\
5 & 142.6425 \\
6 & 156.7710 \\
7 & 165.7765 \\
8 & 157.1610 \\
9 & 157.3149 \\
10 & 161.3780 \\\hline

\end{tabular}
\end{center}

It seems our prediction was incorrect and the partition size of 4 would be optimal for this situation.
\section{Results}
On the paper SELECT2 is faster then SELECT. But due to a programming error, in combination with not running the algorithm in-place. The buffers get filled when run on a number 100.000 or higher. Due to this, we could only sort smaller lists while maintaining linearity. It's still notable that on a large select, SELECT2 is way faster then SELECT.

SELECT2 was put to test against two sorting algorithms, HEAPSORT and SELECTION SORT. On a fairly disavantageous number of 1.000.000 (since forementioned error), HEAPSORT was by far the fastest, with a 0.36 sec runtime, followed by SELECT2 with 30.0 sec and SELECTION SORT with 648.3 sec. However, to preform linearly, SELECT2 would have to had run the algorithm in about 3 sec. Displaying the error.
\section{Conclusions}
Even though HEAPSORT cost $\theta(n log n)$ and ours cost $\theta(n)$. It would take enormous numbers before being able to reap benifits of that growth, since the constants in play are so large.

Another thing to note is that even though SELECT2 run faster then SELECT on a large selection, it uses a larger recursion depth then SELECT, which might become a problem on smaller systems.
\end{document}
