Gary Chien journal

Activity Log

1/26 (1 hr) - Familiarized myself with Slack group, read through course website and syllabus, looked up examples of AR smartphone applications (particularly Google's Project Tango, which preceded the current ARCore).
1/27 (3 hr) - Installed Android Studio (first on Windows machine, then Linux) and went through "Hello World" Android development tutorial. Started studying basics of Android development, in preparation for ARCore development.
1/28 (1 hr) - Created Journal page and added bio. Installed ParaView. Contributed to "AR Development Software" wiki page. Spent some time pondering if my past HoloLens projects can be adapted using an Android phone. Decided I don't know enough yet about ARCore's spatial mapping capabilities.
2/2 (2 hr) - Browsed the VR Software Wiki. The section on academic labs specializing in vr/ar caught my interest - I want to learn more about where cutting edge research in this area is focused on now. In particular, I want to know what the current open problems are for AR, and whether I can design a project around one of these problems. Further digging on Google Scholar revealed that there is a lot of work going on in the robotics field about using ar/vr to faciliate human-robot interaction. One paper I found interesting was "Facilitating HRI by Mixed Reality Techniques" (Renner et al.), which outlined a proposal for designing a HoloLens interface to help users interact with mobile robots.
2/3 (3 hr) - Completed more Android development tutorials. Potential project titles: 1) "ARCore visual navigation system within known environments"; 2) "ARCore robot teleoperation"; 3) "An evaluation of key strengths/weaknesses between ARCore and ARKit".
2/4 (2 hr) - Thought about project requirements.
- "ARCore visual navigation system within known environments" would involve three steps:
  - Find some way to represent 3D visual space using spatial mapping, perhaps within Unity3D
  - Extract visual features to use as spatial anchors to localize phone within internal representation of the environment
  - Use the knowledge of the phone's location to give visual directions to guide the user to some destination (like an arrow in a video game)
  - CLASS ACTIVITY: create a map of the classroom, enable app to help users navigate around chairs and out the door.
- "ARCore robot teleoperation"
  - Create ROS scripts to send movement commands to a robot. They should be functions that an external API can access in some way (can adapt code from robotics research for this)
  - Calibrate the ROS and ARCore coordinate systems so that the robot and ARCore share the same virtual 3D space.
  - Create an ARCore interface to translate touchscreen gestures to robot motion (e.g. placing waypoints on the ground for the robot to move to)
  - CLASS ACTIVITY: make the robot do cool things
- "An evaluation of key strengths/weaknesses between ARCore and ARKit"
  - Initialize ARCore and ARKit for an Android device
  - Go through tutorials for both, to learn how to access key features
  - Benchmark various features, such as pose estimation accuracy, hologram drift, etc.
  - CLASS ACTIVITY: deploy ARCore and ARKit apps with equivalent features to two devices, let people use both and decide which one they like better.
2/6 (3 hr) - I learned that ARCore doesn't support object recognition capabilities beyond 2D visual markers, which seems like a big limitation to potential AR apps. I wonder if I can focus my project on finding a workaround to this somehow. It would be nice to integrate OpenCV methods or pre-trained TensorFlow models in an ARCore app - perhaps by offloading the computing to an external computer somehow, and sending the outputs back to the phone? ARCore does have functions that let you access the current camera frame. I think I'll make this the focus of my project. Because this is a rather big goal, I'll narrow the focus down a little, to simply integrating external computer vision processing into an ARCore app (could be something as simple as OpenCV color thresholding). If this works out, I'll extend this project into implementing a full-fledged 3D object recognition feature.

Pre-project plan

- Title: Integrating custom image processing techniques in ARCore
- Goal: Extend ARCore features with custom OpenCV vision algorithms.
- Method: Stream camera feed from phone to laptop, and perform vision computations on the laptop before sending the results back to the phone to visualize in the ARCore app. Focus will be on phone-laptop communication, so vision algorithm will be simple.
- Tools:
  - ARCore compatible phone
  - Laptop, for external vision processing
  - ARCore and OpenCV
- Milestones
  - 2/12 - Become familiar with Android development, begin going through basic ARCore tutorials.
  - 2/14 - Try to extract camera frames from a simple ARCore app, to see if project is even possible.
  - 2/21 - Successfully stream data between phone and laptop while running ARCore.

2/10 (6 hr) - Went through three weeks of Coursera's "Introduction to Augmented Reality and ARCore" course. Began going through intro ARCore tutorials. Followed the ARCore quickstart tutorial to get the "Hello Sceneform" sample project running on my phone. With this sample project, I am able to place AR droids on my apartment floor.

Project evaluation:

The proposed project clearly identifies deliverable additions to our VR Software Wiki (5)
The proposed project will inform future research, ie, advancing human knowledge (5)
The proposed project involves large data visualization along the lines of the "Data Types" wiki page and identifies the specific data and software that it will use (1)
The proposed project has a realistic schedule with explicit and measurable milestones at least each week and mostly every class (4)
The proposed project includes an in-class activity (2)
The proposed project has resources available with sufficient documentation (3) -- ARCore documentation is scarce

Journal evaluation:

Journal activities are explicitly and clearly related to course deliverables (5)
Deliverables are described and attributed in wiki (2)
Report states total amount of time (5)
Total time is appropriate (1) -- rough couple of weeks have caused me to become irresponsible with keeping up with journals, will start picking up the pace

2/20 - (1 hr) Wrote the final proposal for my project

Final proposal:

- Title: Creating a comprehensive beginner's tutorial to ARCore
- Goal: Make an easy-to-follow tutorial in the course wiki that anyone with basic Android development experience can use to learn ARCore.
- Method: Learn ARCore using whatever resources available online. Constantly document findings and new techniques learned, to eventually be integrated in the wiki. Continue adding to the tutorial until it has enough content to teach anyone to build AR apps.
- Tools:
  - ARCore compatible phone
- Milestones
  - 2/12 - Became familiar with Android development, begin going through basic ARCore tutorials.
  - 2/14 - Completed basic ARCore tutorial, deployed the "Hello World" AR app to my own phone to test.
  - 2/21 - Begin reverse engineering the "Hello World" app and understand it enough to describe it within the wiki
  - 2/26 - Begin writing the tutorial, covering basic Android development requirements (including a list of ARCore compatible phones)
  - 2/28 - Write the "Hello World" tutorial, describing how the code provided in the sample project works. Use this sample project to describe how to make an AR app from scratch.

2/24 - (4 hr) Created an outline for my tutorial series on the wiki (in "6.3. AR Software Tutorials"). I created a hierarchy of subpages with some initial titles and headers, to be filled in later. I plan on updating the tutorial series in parallel to learning ARCore myself, so that it will be complete whenever I become sufficiently skilled at ARCore development. Completed some more basic Android tutorials (creating todo apps etc.) because I still have difficulty grasping fundamental Android development concepts. There's a lot of new terminology to take it, but I'm sure that it's just something that takes time and practice.
2/27 - (1 hr) Did some light reading into Android tutorials. Have decided to begin going through an online Android development course offered by Stanford, get inspiration for how to structure my wiki.
3/5 - (5 hr) Worked a simple ARCore app to place holograms on flat surfaces, essentially replicating the "Hello World" sample Sceneform app. Can currently visualize surface planes, need to figure out how to generate holograms before I begin transcribing the process in the tutorial wiki.
3/7 - (5 hr) Implemented the ARCore app in its most basic form from scratch: allows for tapping to place objects, and dragging/rotating/resizing existing objects with finger gestures. I wanted to go a bit beyond this, by allowing users to select from multiple objects in a menu. This turned out to be more complex than anticipated, so I decided to just focus my Tuesday tutorial on the basic tap-to-place app. I will be unable to work this weekend, so the tutorial unfortunately won't be uploaded until Monday.
3/9 (3 hr) I found a way to load 3D models into ARCore via the internet! This will add a new element of fun to Tuesday's tutorial, since it's now a lot easier to experiment with various models found online without having to tediously download and import obj files to the project. It took a while to get working though. It turns out that version 1.7.0 of com.google.ar.sceneform:assets is buggy, and it only works with version 1.6.0.
3/11 (2 hr) Wrote up the first ARCore tutorial, for Tuesday's class.
3/18 (3 hr) Spent time looking into existing research and projects involving offloading processing from AR apps on smartphones. Focused especially on the CloudAR paper, which used an external computer to provide deep learning image processing on a smartphone AR app.
3/20 (3 hr) Tried to figure out how to stream camera images from ARCore to a laptop. This turned out to be more difficult than expected, because only the Unity version of ARCore has an out-of-box way to access the camera image. I stubbornly tried to hack together a solution for Sceneform, but in the end it didn't go far. I was faced with two choices: learn how to access the Android camera the traditional way (furthering my Android development experience), or delve into ARCore Unity development. I ultimately decided to begin going down the latter path. I intend to delve into it deeply over spring break.

CONTRIBUTIONS TO WIKI (up to now)

ARCore Intro tutorial (under the AR software section) (100%)
To come: Tutorial for video streaming from ARCore. (0%)

3/26 (4 hr) - Gave myself a crash course of ARCore Unity development, using any documentation I could find online. Got the HelloAR starter project up and running on my phone. Began investigating how I can extract camera frames within the app to stream to a laptop. My best bet probably lies in the built-in "AcquireCameraImageBytes" function.
3/27 (6 hr) - AcquireCameraImageBytes() is giving me a lot of trouble. From what I understand, it stores the current frame in the YUV-420-888 format. I'm having a lot of difficulty figuring out how to convert this data to a typical image byte array. After a bit of googling, I found that this is a problem that lots of people have struggled with. I attempted to replicate the results of some top StackOverflow solutions, but none of them worked. In addition, because I am deploying the app to a phone, it is incredibly difficult to debug without being able to print messages to some sort of console. I also spent a few hours trying to reverse engineer relevant parts of the "computer vision" sample ARCore app, but so far I haven't achieved any working results. All attempts to save the current camera result in various crashes and errors. Feeling frustrated but will try again tomorrow.
3/28 (1 hr) - I installed the most up-to-date version of Unity and cleaned/rebuilt the sample Unity ARCore project. As a few StackOverflow posts suggested, I looked more deeply into the computer vision sample scene to see how it extracts images from the camera. Specifically, I was directed towards the TextureReaderApi class, which handles the extraction of the camera image data. I didn't have much time to work on the project today, so I didn't get far. I know where to pick up next time though.
3/31 (5 hr) - I spent most of my time deconstructing the TextureReaderApi, TextureReader, and ComputerVisionController classes of the sample project, with no useful results. I understand how the code works, but it is simply not cooperating with me for some reason. At first, I tried to adapt the code into my own ARCore Unity project, but the app would simply crash whenever I tried to run it. I eventually transitioned to directly modifying the Computer Vision sample project, where my goal was simply to intercept the already-generated camera image texture and write the image to file. Even this did not work, although I did get an interesting result - an incredibly static-y image that looks like colorful white noise. Now I'm faced with two choices: continue debugging this issue using Unity, or attempt to tackle this problem using Android Studio. I found that the Frame class has a "acquireCameraImage()" function, which looks promising. Given how much time I've spent trying to solve this problem using Unity, I think it makes sense to try Android Studio instead tomorrow.
4/1 (4 hr) - Transitioned from Unity to Sceneform. Found some StackOverflow solutions to extract camera images from ARCore using Frame.acquireCameraImage(). It took a few hours to understand the code well enough to apply it. I was eventually able to upgrade the app from my Intro to ARCore tutorial to acquire a camera frame every time the ARCore scene updates. Before sending images to a laptop, I wanted to try writing to internal memory first, to verify that the images are indeed valid. After doing so, I inspected the images and found that they were all greyscale (and inverted). I need to figure out how to extract color images.
4/2 (2 hr) - Extracting color images is posing all sorts of issues. Each attempt to do so crashes the app - I need to learn more essentials of how the Android camera works.
4/3 (5 hr) - I focused on implementing ARCore/laptop communication via networking today. I'm very inexperienced with sockets, so I looked through wikipedia articles and a handful of tutorials. I want to write computer vision code in Python, so the difficulty of this networking problem lay in establishing communication between two programming languages (Java and Python). I was eventually able to send simple strings from ARCore to my laptop, although I have yet to figure out laptop-to-ARCore communication. For some reason, I am unable to send repeated messages without crashing the app after a few seconds. This is an issue I need to fix before I eventually transition to passing images instead of text.
4/20 (3 hr) - I've been doing a bad job of updating this journal, sorry! I am still stuck on the problem of saving images from ARCore applications. I've been going through solution after solution from StackOverflow, but all of them result in either garbled images or broken files. Hopefully I'll find a solution soon so I can catch up to my milestones. I'm shifting my timeline to focus most of my efforts on image recording/streaming, with a quick and simple computer vision demonstration added on at the end as a sample application.
4/21 (6 hr) - even the Intro to Sceneform Tutorial - the one literally written by Google specifically for taking pictures using ARCore - does not work. I get broken image files that contain no information. I'm wondering if there is something wrong with my phone or the software I'm using. I have decided to put this milestone aside for now and focus on the rest of my project: writing the computer vision script I will be running on the laptop side. I decided to keep it simple. Using online documentation, I spent the day putting together an OpenCV script to detect objects via HSV color thresholding and return the object's pixel coordinates. This script will eventually be run on video streamed from ARCore, and send the coordinates back to ARCore. With this implemented, video streaming is now the final piece of the puzzle before I complete my project (and then I will write a tutorial on how to do all of this!).
5/2 (5 hr) - Been caught up with things outside of class, but I was finally able to set aside a large chunk of time to tackle the image extraction problem. I have yet to figure out how to access the image data from ARCore via Sceneform. I went through several relevant Stack Overflow posts, but none of them yielded correctly formatted RGB images. Eventually, I decided to reach out to various professors and researchers who may have tips. One of them got back to me with helpful advice about how to tackle this problem! The downside is that his solution involves using Unity instead of Sceneform. This essentially nullifies all of the effort I've put into solving this problem thus far, but at least I have a fresh starting point tomorrow.
5/3 (8 hr) - I was able to extract image data! This was accomplished by modifying the HelloAR sample Unity project to make the camera image render to a texture. Once this was accomplished, I could then get the image buffer data from the texture, encode it to a PNG using a built-in function, and save the image to my device. The result was a PNG file of the camera view that I could open and verify that it is a proper RGB image. Getting this to work took quite a while, because I am rather inexperienced with how RenderTextures and image buffers work in Unity. I had to go through lots of documentation and short tutorials to learn how all the pieces fit together. I came out of it feeling like I understand Unity a lot better than I did before today! My next goal is to figure out how to send the extracted image data to a laptop. My first attempt at solving this problem was to write simple UDP sockets (Unity client on phone and Python server on laptop) and send the images from the client to the server. I realized after multiple crashes that this would not work - the images I'm trying to send are much too large for a UDP socket to handle, because of packet sizes limitations. I will have to split the image up into multiple chunks and pass them one by one through the network and reassemble these chunks back into images on the other side. Because UDP is prone to dropping packets, I'll have to use TCP, which means going through more tutorials. Hopefully I'll make progress tomorrow.
5/4 (15 hr) - I solved the networking problem! I did a lot of reading into how sockets work, and why TCP is preferred over UDP for large files (ARCore images could be on the scale of megabytes each). I briefly got intrigued by more high-level tools like HTTP and FTP, but it was difficult to find APIs that would allow for smooth C#/Python interaction. I ended up implementing my solution using TCP socket networking. The basic concept is as follows: the Unity app extracts the image data from ARCore, converts it to a PNG in the form of a byte array, and sends that byte array to Linux. The byte array is on the scale of >100,000 bytes for each image, which is much higher than the maximum allowable packet size. Thus, the only way to send these images through the network is to break them into chunks of 4096 bytes and send them one chunk at a time, reconstructing them on the remote computer via a Python script. Implementing this reconstruction was a very difficult process. Because packets are sent continuously one after another, it is difficult to tell when one image ends and another begins. I had to introduce the idea of "header" packets that precede the byte packets for each TCP array and contain information about how large the image is. This information is then used to know exactly how many bytes from future to reassemble into a new byte array in Python, in order to later pass the image to OpenCV for vision processing (this will come tomorrow). It took many, many hours to debug various issues and wrap my head around the math and logic, as well as giving myself a crash-course of big and little-endianness of byte ordering, to get this to finally work. I had to dive into Unity coroutines and multithreading in order to receive messages from laptop to ARCore so that waiting for messages doesn't block the progression of frames, and I had to fix my broken OpenCV install (which took another good couple hours) in order to begin working on vision processing. Hopefully things will start going smoothly from here on out - all that's left is to perform vision processing, pass coordinates back to the phone, and place holograms!
5/6 (8 hr) - Somehow I lost the object detection script I had written before, so I had to rewrite the entire thing. The object detector works by performing color thresholding on each image. The goal is to threshold out a single color from the image to track an object with a single, solid color. Each image that ARCore sends to the laptop is converted from RGB color format to HSV color format. The desired HSV values (selected beforehand using mouse clicks) are thresholded out to produce a mask, and the mask is processed to retrieve the image coordinates of the object. These image coordinates are then sent to ARCore as a string. ARCore uses these coordinates to place a hologram above the object. Working out the bugs took a while (mostly issues with TCP ports), but I was eventually able to get the whole system working well enough for a demo in class tomorrow!
5/8 (5 hr) - I spent today designing and filling in the poster I will be displaying at the demo. I also did some fine-tuning to my code to make it easier to demo.
5/13 (0.08 hr) - Here is a link to my poster! Poster. My tutorial will be submitted to the wiki by Friday, with permission from David. So sorry about the delay!
5/16 (8 hr) - Wrote the tutorial for my final project. A combination of working with a different computer setup than when I was developing this code, with accidentally backing up the wrong version of the TCP server code caused this tutorial to take a bit longer than anticipated. Luckily, I was able to fix the code up again. On top of that, I spent some time adding plenty of comments for documentation purposes. The tutorial was fun to write, and I hope someone may find it helpful someday!

TOTAL TIME - 132 hr

----

Peer Review Done by Giuse Nguyen

Journal activities are explicitly and clearly related to course deliverables --(5)

- Easy to follow along with the weekly progress

deliverables are described and attributed in wiki --(4)

- Explains what has been added to Wiki

report states total amount of time -- (2? )

-Time spent is seems sfficient week to week. Should add an indicator/total though.

total time is appropriate -- (4)

- Self-explanatory

----