When a language forces simpler thinking
This year I decided to join the 2022 Advent of Code event. The exercise on the 7th day involved storage of a folder structure: each folder would have a name, files, and other folders in itself. For the task itself, you don’t actually need to store the files, just the sum of their (and of those in subfolders) size, so a folder can be represented as a name, a number and multiple other folders.
My solutions are written purely in Common Lisp: a programming language where everything is represented as a list of values. A value might be another list or a non-list value, like a number. So I thought I could represent each folder as a list, where the first item is it’s name, second is file size sum and the rest are subfolders, something like this:
Name File size sum Subfolder "e" in "a"
| | |
v v v
("/" 48381165 ("a" 94853 ("e" 584)) ("d" 24933642))
^ ^
| |
Subfolder "a" in "/" Subfolder "d" in "/"
Now, after parsing a file, you’ll need to update the file size sum in the current folder and in the parent folder. To do such a thing, you would either need to
- iterate from “/” to that folder, find/store all folders in between, and update them, or
- store some way to access the parent folder, and just travel upwards
I decided to go with the second option, since it seemed more elegant and potentially more efficient. It is important to note that, for the past year and a half, I’ve extensively studied C++ in university, so my first though was for each folder to store some sort of pointer or reference to the memory address of the parent folder.
Here is where Common Lisp slaps me on the wrist: there is no such thing as a pointer. You can kinda store a reference to the parent folder object, but that will include the subfolder itself, resulting in an endless self-referencing loop, making the list unusable.
After thinking for a long long time, trying to force this idea to work, I had a revelation: what if, instead of trying to have lists in lists in …, I just flatten everything and store indices to parents?
Index of parent folder name (doesn't exist)
|
Name | File size sum
| | |
v v v
("/" -1 48381165 "a" 0 94853 "e" 3 584 "d" 0 24933642)
^ ^ ^ ^ ^ ^
| | | | | |
Name | Name | Name |
| | |
At index 0 is parent "/" Parent is "a" Parent is "/"
Now this solution worked great! With this I was able to solve both tasks easily.1
Appendix
#In the exercise, you have commands for navigating between folders. Going up a folder is trivial, but folders can have duplicate names if they’re inside different parent folders, so going to a sub folder requires to search by it’s name and parent index.
Surprisingly enough, in my solution (lines 15-17), I just search from the end to the beginning of all parsed folders for a name. This works fine for this exact task and exact inputs, however in general this isn’t correct. In a structure like this:
/ (dir)
- a (dir)
- file1.txt (file, size=1)
- file3.txt (file, size=3)
- b (dir)
- a (dir)
- file2.txt (file, size=2)
with an input like this:
$ cd /
$ ls
dir a
dir b
$ cd a
$ ls
1 file1.txt
3 file3.txt
$ cd ..
$ cd b
$ ls
dir a
$ cd a
$ ls
2 file2.txt
$ cd ..
$ cd ..
$ cd a
The last $ cd a
actually puts you into /b/a/
, not /a/
.
The main thing that saves us (me) is that you can only change directory and list structure (in current directory), so you would only ever be visiting a folder once.
If a folder’s contents were ls
’d more than once, or they could change (for example having an mkfile
2 command), a proper search would be required.
-
A potential improvement would be to use CLOS to store each folder as it’s own object, but this worked fine and I really wanted to spare reading on CLOS at that time. ↩
-
mkfile
is liketouch
, but you could specify a size and the file will be filled with that many zeroes. It seems to exist solely in Solaris (as a system administration command) and MacOS. I mentioned it and nottouch
, since the latter only creates empty files, which wouldn’t contribute to the total file size sum. ↩