programming assignment kd trees

public class Point2D implements Comparable<Point2D> { public Point2D(double x, double y) // construct the point (x, y) public double x() // x-coordinate public double y() // y-coordinate public double distanceTo(Point2D that) // Euclidean distance between two points public double distanceSquaredTo(Point2D that) // square of Euclidean distance between two points public int compareTo(Point2D that) // for use in an ordered symbol table public boolean equals(Object that) // does this point equal that object? public void draw() // draw to standard draw public String toString() // string representation }
public class RectHV { public RectHV(double xmin, double ymin, // construct the rectangle [xmin, xmax] x [ymin, ymax] double xmax, double ymax) // throw an IllegalArgumentException if (xmin > xmax) or (ymin > ymax) public double xmin() // minimum x-coordinate of rectangle public double ymin() // minimum y-coordinate of rectangle public double xmax() // maximum x-coordinate of rectangle public double ymax() // maximum y-coordinate of rectangle public boolean contains(Point2D p) // does this rectangle contain the point p (either inside or on boundary)? public boolean intersects(RectHV that) // does this rectangle intersect that rectangle (at one or more points)? public double distanceTo(Point2D p) // Euclidean distance from point p to closest point in rectangle public double distanceSquaredTo(Point2D p) // square of Euclidean distance from point p to closest point in rectangle public boolean equals(Object that) // does this rectangle equal that object? public void draw() // draw to standard draw public String toString() // string representation }
public class PointSET { public PointSET() // construct an empty set of points public boolean isEmpty() // is the set empty? public int size() // number of points in the set public void insert(Point2D p) // add the point to the set (if it is not already in the set) public boolean contains(Point2D p) // does the set contain point p? public void draw() // draw all points to standard draw public Iterable<Point2D> range(RectHV rect) // all points that are inside the rectangle (or on the boundary) public Point2D nearest(Point2D p) // a nearest neighbor in the set to point p; null if the set is empty public static void main(String[] args) // unit testing of the methods (optional) }
      insert (0.7, 0.2)       insert (0.5, 0.4)       insert (0.2, 0.3)       insert (0.4, 0.7)       insert (0.9, 0.6)       --> insert (0.8, 0.1) --> -->
private class Node { private Point2D p; // the point private Value value; // the symbol table maps the point to this value private RectHV rect; // the axis-aligned rectangle corresponding to this node private Node lb; // the left/bottom subtree private Node rt; // the right/top subtree }
private Value get(Node x, Key key) { if (x == null) return null; int cmp = key.compareTo(x.key); if (cmp < 0) return get(x.left, key); else if (cmp > 0) return get(x.right, key); else return x.value } private Value overlyComplicatedGet(Node x, Key key) { if (x == null) return null; int cmp = key.compareTo(x.key); if (cmp < 0) { if (x.left == null) return null; else return overlyComplicatedGet(x.left, key); } else if (cmp > 0) { if (x.right == null) return null; else return overlyComplicatedGet(x.right, key); } else return x.value }

Composing and testing a complex data structure from scratch, again. 1

Learning Goals

Compose algorithms via iterative improvement.

Algorithms can be tricky to implement. One way we can manage the complexity of an algorithm is to develop it in stages, starting from something simple that works and then progressing to more and more complex optimizations while ensuring the program works at every step along the way. This way, we can maintain correctness as we add complexity to our programs.

Encapsulate complex behavior beneath a simple interface.

The best abstractions hide complex behavior beneath a simple interface. Programs often gradually grow more and more complex, so a key challenge of implementing complex data structures is in managing this complexity while maintaining the illusion of a simple interface.

Table of contents

After spending a lecture on k-d trees, it is now time for you to implement your own! We’ll be using this later in the assignment for HuskyMaps , our map server. To get you started, we’ve provided a Point class with some helpful methods to calculate distances. Using this class, you’ll create two implementations of the PointSet nearest-neighbor-finding interface: a basic one using a naive linear search, and a faster-but-more-complex k-d tree implementation.

Getting the Assignment

To get the skeleton (starter) code for the homework, open your CSE 373 IntelliJ project. In the menu bar, tap on the VCS item, then hover over the Git dropdown, tap the Pull… menu item when it’s revealed, and pull the skeleton repository to get the assignment.

Before you begin the assignment, you might find the following visual resources helpful:

If you end up incorporating any resources into your own code, be sure to cite them!

Unsure where to start?

Read over Point and begin implementing NaivePointSet per the spec.

We provide a class, Point , that represents a location with an x and y coordinate. You may not modify this class. The key methods in this class are:

PointSet API

The PointSet<T extends Point> interface represents a set of such points. The generic type parameter T represents the exact type of the points in the set; extends Point indicates that the class only supports types that extend the functionality of Point . (The only class that matches this description in this assignment is Point itself, but this parameter will become useful later in the HuskyMaps assignment.)

NaivePointSet

Before you create your faster KDTreePointSet class, first create a simple-but-slow linear-time implementation. The goal here is to provide an alternative, albeit slower, solution that you can use to easily verify results from your k-d tree’s nearest method.

NaivePointSet will only be tested for correctness, not efficiency.

For reference, our solution for this class is 32 lines total from top to bottom. You should not spend significantly more than half an hour to complete this part.

KDTreePointSet

Now, implement the KDTreePointSet class. Your nearest methods should achieve O ( log ⁡ N ) \mathcal{O}(\log N) O ( lo g N ) runtime on real-world data (i.e., you don’t need to worry about the absolute worst-case runtime).

Your constructor may assume that its input has random ordering (i.e., is not sorted in any way). In particular, do not randomly shuffle the given points, since that makes it difficult to test your class reliably. Instead, we’ve included a static factory method createAfterShuffling to do the shuffling before calling your constructor—you can call this to defend against sorted data, while still maintaining testability by writing tests that call the constructor directly.

While k-d trees can theoretically work for any number of dimensions, your implementation will only work for the 2-dimensional case where our points have only x and y coordinates.

About a quarter of the points for this homework will be allocated towards evaluating the correctness of KDTreePointSet . The remainder (minus the points for NearestPointSet ) will be allocated towards evaluating the efficiency of KDTreePointSet , but these tests will only provide credit if the implementation is fully correct.

We only provide a few tests for this homework, and most of the scoring done by the autograder will be based on large randomized tests that don’t provide meaningful feedback.

Creating a test suite that spans all possible k-d tree edge cases is significantly more difficult than previous assignments due to the complexity of the problem we are solving—there’s a whole extra dimension to deal with. Since it’s tedious (or even infeasible) to enumerate all edge cases, we instead turn to randomized testing, a lazier style of testing than unit testing, and we recommend that you do the same if you decide to write additional tests.

To verify the correctness of our k-d tree implementation, we can generate a large number of random doubles to construct random points for our point sets, as well as target points to query nearest . Then, we can simply compare the results of KDTreePointSet.nearest to the results to NaivePointSet.nearest (which is simpler and thus easier to unit test normally).

Of course, this method of testing is not useful for debugging: if you discover any bugs this way, you’ll likely have a difficult time reproducing them in a smaller unit tests since your test case will be both large and randomly changing on every execution. To address the second issue, you should seed your random number generator to produce fixed test data. We suggest you use the Java Random class to generate random coordinate values:

However, this won’t change the fact that your test case may be too large to effectively use the debugger. Often, it may just be easiest to take a break and come back later to re-read your code with a fresh mindset.

Josh Hug. 2019. Project 2AB: Extrinsic PQ and KDTree. In CS 61B: Data Structures, Spring 2019. https://sp19.datastructur.es/materials/proj/proj2ab/proj2ab   ↩

Data Structures and Algorithms - Fall 2015

Image

NOTE: Part of the objective for this project is to mimic what it's like to work on a software solution at a company. Typically, you will not be handed pseudo-code for every operation and subroutine you are supposed to program, but will instead be given a specification for a program--what inputs it should take and what outputs it should produce. You will then have to go figure out how to make it work.

For this project write-up, we are doing a bit of a hybrid approach. We are giving you some information on the kD-tree data structure (which you will be coding), but not everything you will need to know to code it. You are responsible for finding good additional information as needed in order to complete the project. Piazza discussions are especially encouraged.

For this reason, it is also important to read the entire project description and the provided code before you start working. Do not write a single line of code until you are sure you understand what is expected of you and how it fits into the framework of the code we've already given you. There is a lot to read, so make sure you start early!

Learning Objectives:

Getting Started

First download and decompress one of the starter files:

This PA has two major parts, and so these archives contain a large number of files. You will need to create Makefiles for both parts. Here is an overview of all the files:

The bst/ folder

This folder contains the bst.h/c files and the files needed for the bst/main tester program.

The kdtree/ folder

COMING SOON: The rangefighter/ folder

We will release this to you in the week after Thanksgiving. It contains the actual game code for rangefighter , which uses your kD-tree implementation as its main data structure.

Part 1: Range Finding in Binary Search Trees

You have seen in class how binary search trees (BSTs) are useful for supporting a find(x) operation that determines whether a value x is stored in the tree. Consider the following tree:

Let's recall what happens when we execute the command find(6) . First find(x) looks at the root node to check if the value at the root node is equal to what we are looking for. It is not (6 != 8). Thus we need to recurse on one or the other of the subtrees. Because we have a BST, since 6 < 8 , if 6 is to be found in the tree, it will have to be in the left subtree of 8. We thus search recursively on the left-subtree of 8 for 6.

Now let's generalize this idea a little. Instead of finding a specific value x , what if we wanted to find any value in a given range? For instance, what if we wanted to retrieve all values stored in the tree between 3 and 11? How would we implement this operation?

This operation is called range searching . We do not want to determine whether a particular value exists or not--as in the case of find(x) --but instead we want to find all values in the tree within the given range [3, 11] .

To support this operation, we perform a recursive procedure that is very similar to the find(x) operation. Again, we start at the root node 8 and ask the question, "is 8 in our range [3, 11] ?" The answer is, of course, yes, so we output 8. We then need to determine whether to continue searching recursively in the left subtree, the right-subtree, or both .

This is the main difference between the find operation and the range_search operation: in the find operation, there is always only one subtree we need to search in: either the left tree or the right. Now, however, there are three possible cases: 1) the range we are searching for can only possibly contain points from the left subtree, 2) it can only possibly contain points in the right subtree, or 3) it may possibly contain points from both. The third case arises precisely when the value of the current node is contained within the range. So in our case, because 8 itself falls inside the range [3, 11] , both the left and the right subtrees may contain points that fall within the range. This means that we need to continue the search recursively in both subtrees.

So what does that look like? Well, let's first recurse on the left subtree. Since 4 is in [3, 11] we output the 4 and again we have to recurse on both the subtrees of 4. We first recurse on its left, 2, which is not in the range, so we do not output it. Since 2 has no children execution returns to 4 and we recurse down the right subtree. Here, 6 is in the range [3, 11] , so we output 6. (If you are following along at home, we have now produced 8, 4, and 6 as output.) We have now finished with all branches of 4 and execution returns back to the recursive call at 8.

We now continue searching on the right child, 12. In this case 12 is not in the range [3, 11] . In fact, all possible values in the range [3, 11] are less than 12. This means that only the left-subtree of 12 can possibly contain any values that fall within [3, 11] . So instead of recursing on both the left and right subtrees, this time we will only continue our search in the left subtree of 12. When we recurse there, we find that 10 is in our range, and so we output it. We have now exhausted all subtrees we needed to search and the process terminates, having outputted 8, 4, 6, and 10, which are exactly those nodes of the tree with values in the range [3, 11] .

Note that these values were not retrieved in order; however, we could easily modify this process to retrieve the values in order by changing the range_search function to use an inorder traversal instead of a preorder traversal.

Building a BST

Recall from class that the efficiency of operations on trees is often bound up with the height of the tree, and that is certainly the case here. In fact, the running time of the range search operation can be shown to be O (log  n  +  k ) on a BST with n nodes, where k denotes the number of elements in the tree that falls within the range. This is a little different than running times you have seen before, because the running time O (log  n  +  k ) is related to both the input size n and the output size k . This is called an output-sensitive run time.

To make sure that we get the log  n term in the running time, our tree needs to be balanced. Suppose we have an array items of n integers we want to put in a BST. How can we build such a balanced tree? Here's a recursive idea. First select the median of the input array and make this the root node. Then build the left subtree from all the items in the array that are less than the median, and build the right subtree from all the items in the array that are greater than the median. We can do this reasonably efficiently by presorting the array. The steps look like this:

Task 1: Implement a Range Search BST

Your first task is to implement a binary search tree (BST) structure equipped with the following operations found in the bst.c file in the bst/ folder in the starter code:

void bst_build(bst_t *tree, int array[], size_t length);

The input parameters consist of an empty binary search tree and an unsorted integer array. This function should create nodes in tree in order to build a balanced binary search tree containing the elements from array .

IMPORTANT : This function should run in O ( n  log  n ) time.

HINT : You will need a sort operation. You may use one of your sorting routines from PA4 or the built-in qsort method .

bool bst_find(bst_t *tree, int x);

Returns true if the value x is contained in the tree, false otherwise.

IMPORTANT : This function should run in O (log  n ) time.

void bst_range_search(bst_t *tree, int range_min, int range_max, dynarray_t *output_array);

This function takes as input a binary search tree, a range [ r a n g e _ m i n ,  r a n g e _ m a x ] , and a dynamic array output_array . It should perform the range search described above and add all items in tree that fall within the range to the output_array .

IMPORTANT : This function should run in O (log  n  +  k ) time, where k is the number of points returned.

And of course, the usual memory management functions:

void bst_init(bst_t *tree);

Initializes an empty binary search tree. (We have provided this for you.)

void bst_free(bst_t *tree);

Frees any memory (i.e. the nodes) for this binary search tree.

NOTE: You will need a Makefile for the bst directory that builds the main program from the files found in the bst/ folder.

The ./main test program we have included should output the following to the console when run:

Furthermore, it should produce a turtle graphics image named output.bmp that looks like this:

Output of bst main

Output of bst main

Part 2: kD-Trees

A BST equipped with a range search operation as above is useful for many different applications. For instance, if we had a list of employees and wanted to answer queries like, "how many employees do we have between the ages of 20 and 25?" a BST would be a perfect data structure for supporting this type of query. However, what if we wanted instead to find all employees between the ages of 20 and 25 who make between 30 and 50 thousand dollars a year? A BST is not equipped to deal with this operation, because a BST only stores 1-dimensional data. Our query, however, is 2-dimensional, we want to query both an age range and a salary range. In order to handle such queries we need to generalize the binary search tree to higher dimensional data. We call this the kD-tree (where k is the dimension of the data).

For this PA you will not implement a full-blown kD-tree, but instead will implement a 2D-tree for handling 2-dimensional data.

Now our data, instead of being a list of numbers, like 8, 4, 12, 2, 6, 10, 14, is a list of 2D points. In the age/salary example, we could have the x-coordinates represent age and the y-coordinates represent salaries. So we might have, (23, 25000), (20, 55000), (50, 55000), (27, 95000), (24, 32000), (40, 28000), (35, 50000) as a list of employee age-salary "points". How can we store this using a tree? We could simply choose one of the dimensions, say the x dimension and simply build a normal BST on this dimension. Something like:

The problem with this approach is that we can still only support range search queries on the x -dimension.

Recall that we build a BST by recursively selecting the median, and then recursively building the left-subtree from all points less than the median, and the right-subtree from all points greater than the median. The key idea behind a kD-tree is to alternate which dimension we choose the median from each time we recurse.

In other words, if we are building a kD-tree age-salary points above, at the top level we would use the median of the age, we would then recurse and for the next level use the median of the salaries. At the following level we would again use the median of the age, and so on. The resulting tree looks like this:

For higher dimensional data, you continue with the same idea, every time you recurse in your build tree, just cycle through the available dimensions.

Range searching a kD-tree

Now that we have seen how to generalize a BST to 2-dimensional data, let's discuss the actual range searching. In 1 dimensional data, a range is just a range between two numbers, as we have seen. Something like [3, 11] which is all (1D) points lying between 3 and 11. There is no one right way to generalize this idea to 2-dimensional data. In fact, here are two possibilities that are equally natural:

Box Range

Circular range

Circular range

Motivation: Range Fighter

Your second task is to implement a kD-tree for 2-dimensional points. To make things a little more interesting, you are going to be implementing this kD-tree to support a computer game based on the old Atari Asteroids game. The game is called "range fighter".

The player pilots a space ship through a debris field using the keys W, A, and D to move the ship forwards, left and right, respectively. The ship is equipped with two weapons. The first is a box range weapon that is activated by holding the space bar down. When the player hits the space bar, the box range weapon activates and a box is drawn from the initial position of the player when the space bar was pressed to the position of the player when the space bar was released. All of the debris within the box is destroyed when the space bar is released. The second weapon is a panic weapon that takes a long time to recharge but immediately destroys all debris within a certain distance of the player. It is activated by hitting the 'e' key.

Here is a screenshot of the game:

Game Screenshot

Game Screenshot

Your assignment is to implement a kD-tree for storing the debris field that supports both circular and box range searching operations. Rather than have you get started on a full-blown game (which might make debugging difficult!) we have provided a main program that uses turtle graphics to draw an image that looks like a screenshot from the game. You should first get your kd-tree working with this program, since it will be easier to debug. Once your kD-tree is working, you will be able to copy and paste your kdtree.c file into the game program code folder which we will give you the week after Thanksgiving.

Task 2: kD-Tree Implementation

All of the functions you need to implement are declared in the kdtree.h file and should be defined in kdtree.c . You may write helper functions in kdtree.c and declare them at the top of the file (but NOT in kdtree.h or your code won't compile when you submit it). For this assignment we have provided working iterator code, but you will not need to use iterators for any of this assignment. The iterators are there so that the main game application can use them. You need to implement the remaining functions in the kdtree library:

void kdtree_init(kdtree_t *tree)

Initializes an empty kD-tree. (We have provided this for you.)

void kdtree_free(kdtree_t *tree)

Frees a kD-tree.

void kdtree_build(kdtree_t *tree debris_t array[], size_t length)

Builds a kD-tree with the debris_t objects in array . This is essentially the same as the bst_build operation for BSTs except that it needs to alternate between using the x -coordinates adn the y -coordinates at each level of the tree for finding the median.

void kdtree_box_range_search(kdtree_t *tree, bbox_t range, dynarray_t *debris_in_range)

Adds all active debris that fall within the range to the dynamic array debris_in_range .

HINT: How can you tell, at a given node, whether to recurse on the left or right subtree? Remember that at a given node we either split by x-coordinate, or we split by y-coordinate. Suppose the node represents an x-coordinate split. Then you are essentially doing the same check as in a 1D tree, where the range you are searching on is [range.minx, range.maxx] . If the node represents a y-coordinate split, then you are checking the y-coordinates against the range [range.miny, range.maxy] .

HINT: In your recursive helper function you will probably need to track a depth in order to tell if you're looking at x- or y-coordinates.

void kdtree_circ_range_search(kdtree_t *tree, point_t p, float radius, dynarray_t *debris_in_range);

Adds all active debris that fall within a distance of radius from the point p to the dynamic array debris_in_range .

HINT: How can you tell, at a given node, whether to recurse on the left or right subtree? Remember that at a given node we either split by x-coordinate, or we split by y-coordinate. In the first case, we should first check whether the center of the circle lies to the left or the right of the x-coordinate of node's data. If it is to the left, then we definitely have to search the left subtree. If it is to the right, then we definitely have to search the right subtree. The question then is when do we have to search the other subtree as well. We must do this only if the absolute value of the difference in the x-coordinates of point p and node->debris.pos is less than the radius , because otherwise, the circle centered at p with radius radius is completely contained on one side of the split. (If this node represents a y-coordinate split, then this the discussion above is the same except for using y-coordinates instead of x-coordinates.)

HINT: In your recursive helper function, you will probably need to track a depth in order to tell if you're looking at x- or y-coordinates.

Each piece of debris is represented by a debris_t object, which is defined in helpers.h in the kdtree/ folder. A debris_t object stores a 2D point named .pos for its coordinates and also stores a boolean value .active for whether the debris is active in the debris field or has been destroyed. Your kD-tree will store debris_t objects.

The helpers.h file also contains a few helpful types, such as bbox_t , which is used to define an axis-aligned bounding box for all points with x-coordinate between .minx and maxx and all points with y-coordinate between .miny and .maxy . It also defines point_t , which is used to specify a single 2D point.

Suggested Progression

First create a Makefile in kdtree/ , get the code compiling, and run ./main . You should get the following output rangefighter.bmp :

Initial Output

Initial Output

Next, implement the memory management and build-tree functions. Once these are working your output rangefighter.bmp should look like:

Tree is built!

Tree is built!

Next, implement the box range searching. Once this is working you should see this:

Box search is built!

Box search is built!

Notice how all the debris within the green box is now highlighted green. The pieces of debris are randomly-generated, so the field will look slightly different every time you run the program, but all debris inside the box should be green regardless of their exact location.

Finally, get the circular range search working and the debris within the blue circle should appear blue:

Everything is working, yay!

Everything is working, yay!

If you are having trouble debugging your code, you may wish to enable the call to draw_kdtree in main . This should give you a graphical representation of your kD-tree:

kD-trees are cool!

kD-trees are cool!

HINT : For debugging purposes, it can also be useful to temporarily lower the number of pieces of debris. You can do this by modifying debris_count in the create_random_debris_tree function. Here is what it looks like with only 16 pieces of debris:

Fewer pieces of debris

Fewer pieces of debris

Final Submission

DUE DATE: Friday, December 11, 2015 at 23:59 EST (11:59pm)

Submit ONLY your completed bst.c and kdtree.c files on Canvas using the appropriate assignment submission. Make sure the files contain a comment field at the top with your name and a statement of compliance with the JMU honor code .

IMPORTANT: There should be NO memory leaks in any of the code you submit for this assignment (both parts). You should test this using valgrind on both main programs.

Please check your code carefully and make sure that it complies with the project specification. In addition, make sure that your code adheres to general good coding style guidelines. Fix any spelling mistakes, incorrect code formatting, and inconsistent whitespace before submitting. Make sure you include any appropriate documentation.

IMAGES

  1. Programming Assignment Help

    programming assignment kd trees

  2. Dynamic Programming on Trees

    programming assignment kd trees

  3. Programming Assignment Help

    programming assignment kd trees

  4. A kd-tree example: subdivided data (left) and the corresponding binary...

    programming assignment kd trees

  5. KD tree algorithm: how it works

    programming assignment kd trees

  6. Visualization of the k-d tree algorithm.

    programming assignment kd trees

VIDEO

  1. Guster

  2. Bikini Atoll SpongeBob island plam trees 1080P 1440P 4K 16:9 30FPS Widescreen remastered extended

  3. Biological process design for wastewater treatment

  4. 2022 Australia Day Awards winner

  5. DAN TAN!

  6. Garden Stroll Through the Seasons at Garden Canadensis 2022 (Part IV: August

COMMENTS

  1. maximkir/coursera-algorithms-part1-kdtree

    My solution to week 5 programming assignment: Kd-Trees - GitHub - maximkir/coursera-algorithms-part1-kdtree: My solution to week 5 programming assignment:

  2. algorithms-princeton-coursera/KdTree.java at master

    algorithms-princeton-coursera/Codes of Programming Assignments/part1/pa5-kdtree/KdTree.java · Footer.

  3. Programming Assignment 5: Kd-Trees

    Write a data type to represent a set of points in the unit square (all points have x- and y-coordinates between 0 and 1) using a 2d-tree to support

  4. Programming Assignment 5 Checklist: Kd-Trees

    Programming Assignment 5 Checklist: Kd-Trees · Write isEmpty() and size(). · Write a simplified version of put() which does everything except set up the RectHV

  5. Programming Assignment 5: Kd-Trees (Part II)

    Programming Assignment 5: Kd-Trees (Part II). Write a data type KdTree with the same public methods as PointSET to represent a set of points in the unit

  6. Spring 2021 Programming Assignment 2: Wrapped k-d Trees

    Programming Assignment 2: Wrapped k-d Trees. Handed out: Tue, Apr 6. Due: Wed, Apr 21, 11pm. (Submission via Gradescope.) Overview: In this assignment you

  7. k-d Tree

    k-d Tree · Learning Goals · Table of contents · Overview · Getting the Assignment · Point · PointSet API · Submission.

  8. Assignment 3: KDTree

    in the same kd-tree as points in four-dimensional space. ... to complete the assignment, I've broken the program down into a series of five.

  9. Solved Programming Assignment 5: Kd-Trees Create a symbol

    Question: Programming Assignment 5: Kd-Trees Create a symbol table data type whose keys are two-dimensional points. Use a 2d-tree to support efficient range

  10. CS 240

    We are giving you some information on the kD-tree data structure (which you will be coding), but not everything you will need to know to

  11. Assignment 2: KDTree

    In this assignment, you will implement a special data structure called a kd-tree (short for “k-dimensional tree”) that efficiently supports