Monday, July 17, 2017

Swapping an HDD for an SSD in a Low-End 2017 HP Laptop

In less than two day's time, I was able to purchase my third computer from a perfect-craigslist-stranger (without issue!) and replace the hard disk with a solid state drive, and I wanted to take a minute to document the experience here; if you are interested only in a rough guide for SSD installation for a 2017 HP laptop, you are heartily encouraged to skip down several paragraphs.

The computer in question is an HP 15-ba113cl that the original owner bought in February 2017 from our local Costco for around $400. It has an AMD (a first, for me) A10-9600P processor, 12GB DDR4 RAM, a 1TB HDD, and a touchscreen (another first). It came with Windows 10 Home and, horror of horrors, a heaping pile of HP bloatware. My wife's undergrad laptop was an HP and it was riddled with problems from the outset (including a failed sound card) and, three years or so in--once it realized it was safely outside the warranty window--it proceeded forthwith to give up the ghost. Ever since this harrowing  experience I have been extremely leery of the brand, and my wife still refuses to have anything to do with it (we have wisely turned away several free heirloom printers).

Thus, I wouldn't have considered purchasing such a product unless the thing came at quite a deal, and this it did: I ended up getting a months-old computer (albeit very low-end) for $150. Now, our desktop computer used to get quite a bit of use, but it began its slow slide into senescence a couple years ago and now it operates at such an unbearable crawl that it's main purpose has become that of an external storage tower for our laptops. Its components date back to 2005 or so: I acquired it in 2007 from a guy in Portland who had built it, but the sole hard disk is well over 10 years old and has thus been liable to give out any minute for the past seven years or so (don't worry, we are not complete idiots and thus use other more modern forms of backup).

My own personal laptop has been, and continues to be, a champion in spite of its age. It is a Lenovo ThinkPad T430s (2012) with a 3rd generation Intel i7 chip, 16GB DDR3 RAM, a 128GB SSD, and a 500GB HDD that I can swap out with an optical drive, an extra battery, etc. using a caddy (see end of post). I got it in October, 2013 from a local Austinite businessman and I have used it virtually every day since. It came with Windows 7 on the SSD; I kept it, but shrunk the partition and added a bigger one for Ubuntu LTS, which I use almost exclusively, only booting Windows to use very specific MS-only software. 

Initially I was going to replace the old desktop with a brand new low-end desktop like so, but, given that my laptop is now five years old and has a few portability issues as it is (bad battery, no camera, dim screen) while still remaining super powerful withal, it occurred to me that a more sensible way forward perhaps lies in making the ThinkPad the new "home computer" and desk-top successor: I could just hook it up to the big monitor and all would be well. Several considerations recommended this route: the ThinkPad has more than enough power to run the Adobe software that my wife uses (Lightroom, Illustrator), plus some of the 3D graphics stuff I've been tinkering with of late. Furthermore, it has that expansion bay I mentioned, which makes swapping HDDs easy. Thus, I could use them for extra back-up storage, etc. This idea solidified when I found the cheap HP-15 with the 1TB HDD; I could take that sucker out, swap it for a SSD, and use it as needed for relatively long-term storage. 

So, back to the new HP. When I got the thing home and turned it on, it took over 60 seconds to fully boot up; then, all of the HP crapware and Windows 10 utilities proceeded to kick on, forcing me to wait still longer. Have you, dear reader, experienced the pain of booting an OS on an HDD after becoming well used to a snappy SDD? If you haven't, just know that it is unbearable. I imagine it's something akin to what business executives must feel when they arrive at the country club and trade their sports cars for golf carts. I suspected the perceived slow-down was the HDD, but just for fun I wanted to benchmark the computer so I could compare it to my ThinkPad. I created a Ubuntu LTS partition and installed the hardinfo benchmarking tool. See results below, and please pardon the lousy picture!

Hardinfo generates a full system report, which includes putting your CPU through the paces with 6 different benchmarking tests: things like how long it takes you computer to perform various (mostly recursive and therefore single-threaded) computational tasks on integer and floating-point numbers (e.g., calculating the 42nd Fibonacci number [267914296], doing a Fast Fourier Transform, or finding solutions to the N-Queens [placing n Queens on an n*n chessboard such that no two threaten each other, which I think bruteforces (n*n Choose n) possible solutions]). I can't find very much out about each of these specific benchmark tasks, but I don't care that much: feel free to have a look around their github page for source, etc. The basic processor specs for the two contenders are as follows.

The 2017 HP: 4x AMD A10-9600P (2.4 GHz, up to 3.3 GHz, 2 MB L2 cache)
The 2012 Lenovo: 2x Intel i7-3520M (2.9 GHz, up to 3.6 Ghz, 1 MB L2 cache)

Turns out that the 5-year-old ThinkPad outperforms every step of the way! Still though, not too different on the whole. The new machine should be perfectly capable of handling most any task that I would reasonably need it for. But that HDD had to go immediately. Here's how I got it out: first, locate your laptop's maintenance guide, which will serve as a crude roadmap (chances are you are voiding your warranty the minute you lay hands on a screwdriver, so be forewarned). The maintenance guide for this computer, for example, was found here, though the subsequent steps are probably general enough regardless of what kind of laptop you have.

First, power the laptop down, unplug it, and remove the battery. Give it a few discretionary minutes to cool down. Turn it upside-down and remove all screws from the back (including the two under the rubber feet, which you must first peel off). On this particular computer, there should be 12 screws total: be delicate and try your best not to strip them.

Next, remove the optical drive. Then, take a thin plastic object to serve as a shim (I used the lid of a ballpoint pen). Don't use a flathead screwdriver or anything metal, or else you will wind up with scratches all over your case. We are trying to remove the plastic bottom from the rest of the computer, so gently wedge the plastic shim between where the bottom connects to the keyboard surface. I found this easiest to do near where the optical drive had been, around the USB ports. This is delicate work, so take care you don't stress-fracture the piece of shit plastic case. Once you get your shim in there and get some purchase, work it all the way around the computer gently, doing the front first. The case will effectively unzip: note the little four little rectangular snaps along the edge of the case in the left side of the picture below, where the optical drive used to be. These little snaps go all the way around, and it is these we are essaying to unzip.

Once you have freed it, remove the bottom and set it aside: you should see something like this

Needless to say, try your best to avoid getting dust, pet hair, etc. into the guts. Your HDD is (on this computer) in the bottom left-hand corner. It will be screwed into mounting brackets (or perhaps a rubber gasket), which will in turn be screwed onto the computer. Remove the screws that hold the drive to the computer and gently unplug the SATA connection. Set drive gently aside, perhaps in an anti-static bag.

Then--unless you have an SSD lying around--get on Amazon Prime Now, order an SSD (I recommend SanDisk products as you will see below, but there are many viable alternatives; I used this particular drive), and wait for free 2-hour delivery! Then unscrew the mounting brackets from your HDD and, noting their arrangement, reattach them in the same way to your new SSD.

Now plug in your SATA cable and remount the drive to the computer's HDD housing.

 And you're set! Put the case back on, snap all the little snaps back best as you can, and screw everything back into place (except for the optical drive screw: the left of the two screws in the middle row; clearly it must stay out until you reinsert your drive). Voila! The OS boots up in 20 seconds now, which is less than a third of the time it took the HDD.

The 1TB HDD that came stock with the laptop was actually a name brand (Western Digital blue) so it is probably not too bad. Here it is inside my little optical bay caddy (this is what I use to switch between computers for backups).

You can pick up a caddy like this for about $10, but make sure your computer is compatible with such a set-up (most modern laptops should be, but you never know).

Finally, the reason I'm plugging SanDisk is that I just had a 128 GB USB drive fart out on me after just over a year (and after very little use). It got irreparably locked into read-only, and try as a might to reset and reformat (and I spent over an hour in the attempt), no dice. But as it turns out, the thing has a 5 year warranty! I got on their website and submitted my issue and product information (including a screenshot of my online purchase confirmation); they promptly emailed me a paid USPS packing slip and told me to mail it in, and that I would receive a replacement. That's pretty good customer service for a bloody thumbdrive, what?

Wednesday, June 1, 2016

DIY e-books: Manipulating scanned PDFs with command-line tools

The Memory Dynamics Laboratory is having something of a book club this summer, but the crucial volume was proving difficult to find; it had been out of print for many years and the UT library system only has a single copy! I proudly volunteered to do the hunting-up and ferreting-out because I have at my disposal a skill-set unique to the hyper-stingy tech-savvy long-term student, i.e., time-honored and hard-won methods for coaxing any book I please from the depths of the internet (way too many hyphens going on here). But in this instance, my tricks were all of them unsuccessful!

Still, I had taken on the job and had to see it to completion. So I went this morning to the library to use one of those walk-up overhead scanners I blogged about briefly some 2 years ago. The book in question had 14 chapters, but my advisor already had PDF versions of chapters 7 and 9-14. This meant I ended up scanning about 160 pages: I set it to "black and white" and I angled the v-shaped bed up a bit to prevent cracking the spine. It took me around 25 minutes from the time I first walked up to scan everything (the B&W setting seemed slower than the color and grayscale) which was all then saved directly to a USB drive. I saved it three different ways: a "quick PDF", a "searchable PDF" and as a real-text file, which had sizes 14 MB, 9.7 MB, and 673 KB respectively. The OCR was OK but left out bits of words every few pages; you'd probably be better off with something aftermarket.

OK so at this point I have 4 PDFs: chapters 1-6, chapter 7, chapter 8, and chapters 9-14. However, the preexisting documents where in color/greyscale while the ones I had just scanned were in monochrome. Also, they were both two book-pages to a single PDF page, which was bad. Additionally, there were some scanning artifacts present in all of them (blotches, shadowy places, etc.), though I didn't care too much about that as long as it didn't obscure the text.

Let me now walk you through some free, mostly native command line tools for cleaning up images/pdfs after scanning and for stitching multiple documents of different sizes together into a coherent whole. I will discuss (1) cleaning up the color/white-balance through normalization, (2) splitting 2-page scans into two 1-page scans, (3) cropping/resizing a multi-page document, (4) splitting/concatenating multiple documents; deleting pages, and (5) compressing to reduce file-size. A suite of ImageMagick scripts written by a guy named Fred have been developed for this purpose, but I had difficulty getting these to work properly. Understand that none of these tools or techniques are original to me; I simply did the legwork to bring them all into one place, which consisted mostly of sifting through old stackexchange/superuser/askubuntu posts and replies until I found something that worked. I tried MANY such tools and these were the quickest, easiest, lightest-weight, and best overall in terms of output. Note that what follows applies specifically to scanned texts (books, articles, newsprint, documents, receipts, etc.) that one wants preserved as a greyscale/monochrome PDF (black text on a white ground). These techniques are also helpful for prepping a scan for faxing or OCR. They are also pretty basic (you can get really technical with this stuff), but they got the job done for me!

Code to Clean, Crop, Cut, Combine, & Compress scanned PDFs


We will go through the above code zusammen. At the risk of stating the obvious, I will use the convention "input_file.pdf" for the input file name  and "output_file.pdf" for the output file name; you would do well to change these accordingly.


pdfinfo input_file.pdf

First, I would run pdfinfo on the input PDF file(s) to, well, get info about them (page dimensions, file size, etc). This "pdfinfo" will likely be useful to you later.

convert input_file.pdf -contrast-stretch 2%x20% output_file.pdf

Next I use ImageMagick's convert function with the contrast-stretch operator in order to set the top 2% darkest pixels to black and the top 20% lightest pixels to white, thus eliminating some of the unpleasant shadowy scan artifacts. Play with these values to achieve your desired look (it may take some tinkering). The more familiar convert operator -normalize is actually just a special case of -contrast-stretch (-normalize = -contrast-stretch 2%x1%). Try that one first; it may be all you need!


pdfcrop --margins "0 -20 0 -20" input_file.pdf  output_file.pdf
pdfcrop --margins "5 5 5 5" input_file.pdf  output_file.pdf

You can crop your file (both down and up) using running pdfcrop with the --margins operator. The arguments to margins (numbers in quotes) are the number of points you want to crop from the left, top, right, and bottom of the PDF respectively. Negative values the document down; positive values un-crop the document by the specified number of points, effectively adding a white margin. The first example above has the effect of shaving 20 points off of the left and right sides of your document; the second example adds 5 points of whitespace to all sides of your document. Nota bene: if your document contains pages of different sizes, each page will be individually cropped to the specified margins; thus, your pages will still be of different sizes after running this command. There is a way around this, however.

pdfcrop --bbox "0 0 595 842" input_file.pdf  output_file.pdf
pdfcrop --bbox "-97.5 -121 497.5 721" input_file.pdf output_file.pdf

To avoid cropping each page separately, simply specify the size of the "bounding box" for your document using the operator -bbox. The quoted arguments to -bbox are coordinates (x1 y1 x2 y2) which specify the bottom-left (x1 y1) and top-right (x2 y2) corners of the box, thus specifying a global size for the document to which all pages will be cropped. You may have to do a bit of arithmetic to get this right. Your early call to pdfinfo should result in a report of each page's dimensions; if all pages are the same size, then you will only see one dimension given, something like "Page size:    400 x 600 pts". As it is, your document has its bottom-left corner at (0,0) and its top right corner at (400,600).
Running the first example above on such an input_file.pdf would add a 195 points of whitespace to the top and 242 points of whitespace to the right of your output_file.pdf (595 x 842 is standard A4 paper size: for others see here.) If you wanted to go to A4 with equal margins on all sides, you would do something like the second example above. Bounding boxes smaller than your file will crop down; boxes larger will add margins.

Cutting & Combining

pdftk first.pdf second.pdf third.pdf fourth.pdf cat output output_file.pdf

If you have a bunch of separate PDFs that you would like to stitch together into a single document, use pdftk. The example above concatenates first.pdf, second.pdf, third.pdf, and fourth.pdf into a single file (output_file.pdf) in that order. You can do this with pdfs of different sizes and qualities, but you may need to go back through some of the previously mentioned commands to get it looking uniform throughout and to crop all pages to the same size.

pdftk input_file.pdf cat 1-99 101-end no_hundred.pdf

If you want to get delete a certain page or pages from your PDF, pdftk can do that too! For example, the command above removes page 100 from input_file.pdf and writes it to no_hundred.pdf ("end" is understood by pdftk to mean the last page of the document, but of course you are free to specify this as well).


pdf2ps input_file.pdf
ps2pdf compressed_file.pdf

pdftops input_file.pdf
ps2pdf compressed_file.pdf

There are lots of ways to compress PDF files but for me this was the most bang for the least buck. You simply convert the PDF to a temporary postscirpt file and then convert it back to a PDF. It has worked like a charm for me several times; note though that there are to options given above (using either pdf2ps or pdftops). Start with the top one, but if that doesn't do it then try the bottom. In the present case, it was able to compress my homemade 90 MB down to around 30 MB.

Hope this is useful to someone like me-a-few-days-ago!

Update/Edit (7/11/2016)

Here's a quick edit about how to make a non-searchable PDF searchable with linux tools.

First, make sure you have google's tesseract OCR engine installed
sudo apt-get install tesseract-ocr
Download pdfsandwich, cd to the directory containing it, and run
sudo dpkg -i pdfsandwich_0.1.4_amd64.deb
(replace pdfsandwich_... with your file name)
sudo apt-get -fy install

Running it is simple: if the example.pdf to make searchable is in your current directory, do
pdfsandwich example.pdf
You will get a file called example_ocr.pdf with a searchable text layer "sandwiched" on top.

Saturday, August 29, 2015

♫ Summer Running: Not Very Fast! / Summer Running: Pain in my Ass! ♫

Every couple of days I force myself to go outside and run about 2-4 miles. I do not enjoy it. It makes me feel like I am dying every time; I gasp and wheeze and, even after showering I stay uncomfortably sweaty for a few hours. Worse still, I do not feel "energized" or whatever other vital sensations people claim to derive from exercise; if anything, I feel especially fatigued afterwards, and this only gets more pronounced as the day progresses. However, I have convinced myself that the benefits of cardiovascular exercise outweigh its many miseries; I will go into my whys and wherefores later and try to convince you too (it could just be that I am an insane person); but to start, I want discuss the 'how' and the 'what'.

I started keeping track of this gruelling ordeal using an android app (RunKeeper); it uses GPS and links up with Google fit and is a terrible invasion of my privacy that has probably somehow already sent my info to every extant insurance company and devastated my future premiums. Indeed, it's probably also incremented with each new symptom-related google search ("knees hurt a lot", "gasp and wheeze", "how much sweating is normal" etc). Fact is, information pertaining to my health now exists in the ether, and with supply, demand, and end-user licensing agreements being what they are, someone savvy can get it if they want it badly enough; still though, like blogger, the app is awfully convenient. I tried a couple others (that didn't look as eager to sell your soul) and found them to be complete shit, functionally. Runkeeper is good at what it does; it currently operates with a freemium model, and evidently if you pay a little you get better stats. I used the basic free version and just manually entered everything into a spreadsheet-- it took less than a half-hour.

I started running in late March, but I didn't really seriously commit until June (see histogram). Since then, across 44 different running events, I have travelled 103.72 miles and wasted 13 hours and 22 minutes doing so. There are two basic routes I would run: a short route (~1.7 miles) and a long route (~3.4 miles).

So far, my average speed on the short run is 7:20/mile (440 seconds) with a standard deviation of 26 seconds, while my average speed on the long route is 7:45/mile (464 seconds) with a standard deviation of 20 seconds. On my fastest, I averaged 6:55/mile for 1.7 miles (update 8/30: new best time of 6:48/mile for 1.7 miles). On my slowest, I averaged 8:25/mile for 3.4 miles. Here's a graph showing my improvement over time.

Significant improvement over time, which was expected. A more interesting question is whether my improvement was greater for short runs or long runs.

Separate regression equations were fit for both long and short runs:

AveragePace(Short)= 471.647 - 1.68*(RunOrder) 
AveragePace(Long)= 506.28 - 1.47*(RunOrder)

A quick test of differences between slopes would be this:
Z = (b1-b2)/Sqrt((SEb1)^2 + (SEb2)^2)

This gives:
> (-1.6882- -1.4735)/sqrt(.2348^2+.2595^2)
[1] -0.6135005
> pnorm(ans())
[1] 0.2697727

So nope, slopes don't differ.

R-squares were large (.63 and .74, respectively), indicating that a significant amount of the variance in pace measurements are attributable to practice or the passage of time.  Looking at the graph, there appears to have been three pretty precipitous drops in average pace: initially for the short runs, and then again for the short runs after about 30 running events, but the drop in average pace for my long run time occurred just after my 20th running event, and didn't seem to affect the short runs.  I am happy enough with this; without getting into time series or forecasting (though here's a great tutorial), I checked my residual autocorrelations (ACF) and everything looked OK.

I'm sure I look ridiculous when I run--I wear cut-off jeans, tattered old t-shirts, and my $15 Costco-brand running sneakers. But this is intentional! First, I like the feeling of getting extra use out of my holey old clothes by using them as a running costume. Second, they are positively indecent and wholly unwearable, even to sleep in--out on the block this is another incentive NOT to stop running, indeed not even to slow down!

Why do I do this? Because I am by nature quite sedentary, and evidently this means I am going to get several diseases, my brain is going to atrophy, and I will die quite prematurely. Because I am scared of these things happening, I have been following this self-imposed routine of aerobic hell rather sedulously for the past few months. During the school-year I can tell myself convincing stories about how my daily 5-minute bike-rides to and fro the bus-stop really add up: "surely this is a sufficient amount of exercise". But during the summer, when I can easily remain seated in the same place for the entire day, even these weak rationalizations break down. Running is the easiest means of cardio-ing; you can do it anywhere there's a sidewalk.

Running appears to enhance cognitive performance in healthy individuals.
This wikipedia article provides an excellent summary, but I'll talk about a few specific studies below. Smith et al (2010) analyzed 29 studies that tested the association between neurocognitive performance and aerobic exercise; they found that individuals who had been randomly assigned to aerobic exercise conditions improved in attention, processing speed, executive function, and memory. Here's a PsychologyToday page about another study on the relationship between cardiovascular fitness and intelligence in young adulthood (spoiler: it's very positive).

Not only that, but cardio also appears to be optimal for longevity. VO2 max, the gold-standard measure for cardiovascular fitness, is a good predictor of life expectancy; the higher it is, the lower your risk of "all cause mortality" and cardiovascular disease. The good news is, VO2 max is trainable, especially if interval training is used!

This isn't just me showcasing studies that confirm my beliefs--here's an excerpt  about the relationship between exercise and cognitive function from the recent textbook "Memory" by three leaders in the field (Baddeley, Eysenck, and Anderson, 2014):
"The evidence is much stronger for a positive effect of exercise on maintaining cognitive function. In a typical study, Kramer, Hahn, Cohen, Banich, McAuley, Harrison, et al. (1999) studied 124 sedentary but healthy older adults, randomizing them into two groups. One group received aerobic walking-based exercise, while the control group received toning and stretching exercises. The groups trained for about an hour a day for 3 days a week over a 6-month period. Cognition was measured by a number of tests including task switching, attentional selection, and capacity to inhibit irrelevant information. They found a modest increase in aerobic fitness, together with a clear improvement in cognitive performance. A subsequent meta- analysis of a range of available studies by Colcombe and Kramer (2003) found convincing evidence for a positive impact of aerobic exercise on a range of cognitive tasks, most notably those involving executive processing."

Honestly though, I feel like the amount of car exhaust I have to breathe on my runs probably greatly offsets any potential gains of cardiovascular exercise. Especially when I read horrifying things about how even sitting in traffic can cause brain damage and how living near a busy road increases the risk of birth defects. I sure hope I'm not running right into the very outcomes I intended to run away from!

Here's some R-code I used for this post:
> sd(data1$AvgPace)
[1] 26.28296
> sd(data2$AvgPace)
[1] 20.17638
> mean(data1$AvgPace)
[1] 440.1333
> mean(data2$AvgPace)
[1] 464.6625
> summary(fit1)

lm(formula = data1$AvgPace ~ data1$Order)

Min 1Q Median 3Q Max
-25.918 -10.718 -2.435 9.947 31.418

Estimate Std. Error t value Pr(>|t|)
(Intercept) 471.6471 5.7739 81.687 < 2e-16 ***
data1$Order -1.6882 0.2595 -6.507 8.16e-07 ***
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 16.33 on 25 degrees of freedom
Multiple R-squared: 0.6287, Adjusted R-squared: 0.6139
F-statistic: 42.34 on 1 and 25 DF, p-value: 8.157e-07


lm(formula = data2$AvgPace ~ data2$Order)

Min 1Q Median 3Q Max
-15.9043 -7.8003 -0.2513 6.0742 19.6548

Estimate Std. Error t value Pr(>|t|)
(Intercept) 506.2880 7.1511 70.799 < 2e-16 ***
data2$Order -1.4735 0.2348 -6.276 2.04e-05 ***
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 10.69 on 14 degrees of freedom
Multiple R-squared: 0.7378, Adjusted R-squared: 0.719
F-statistic: 39.39 on 1 and 14 DF, p-value: 2.036e-05

plot(data$Order,data$AvgPace,col=data$LongShort, main="Average Pace over Time (ordinal)", ylab="Average Pace (seconds)", xlab="Order (1 = first run, 2 = second run, ... , 44 = most recent run)")
legend('bottomleft', legend = levels(data$LongShort), col = 1:3, pch = 1)

plot(Date, AvgPace, t="l", xaxt="n", xlab="")
axis(1, at=Date, labels=FALSE)
text(x=seq(1,44,by=1), par("usr")[3]-6.5, labels=labs, adj=1, srt=45, xpd=TRUE)

hist(Month1, xlab="Month (March=3, April=4,...)",main="Logged Running Events per Month",breaks=seq(3,9,1),right=F,labels=T)

plot.ts(res3,ylab="res (AvgPace - SHORT)",main="residual autocorrelation (short runs)")
plot(res4,ylab="res (AvgPace - LONG)",main="residual autocorrelation (long runs)")

Friday, August 28, 2015

Summary/Review of "How Can The Mind Occur in The Physical Universe?"

"...There is this collection of ultimate scientific questions, and if you are lucky to get grabbed by one of these, that will just do you for the rest of your life. Why does the universe exist? When did it start? What’s the nature of life?...
The question for me is how can the human mind occur in the physical universe. We now know that the world is governed by physics. We now understand the way biology nestles comfortably within that. The issue is how will the mind do that as well." 
-Alan Newell, December 4, 1991
I found out about John R. Anderson almost immediately upon discovering intelligent tutoring systems a few years ago; he and his research group at Carnegie Mellon have blazed the way forward with these technologies. Their Cognitive Tutor, for example, is currently #5 out of 39 interventions in mathematics education, as evaluated by the US Department of Education's "What Works Clearing House". I learned that, notwithstanding these educational pursuits, his life's work had been more about developing a "cognitive architecture" -- a model of how the structure of the mind and its components work together to achieve human cognition. I learned that he called it ACT-R (for "adaptive control of thought - rational") and that it has been steadily undergoing refinements since it debuted in the early 70s. Anyway, given how amazed I was with his tutoring-systems research, I was naturally drawn to Anderson's 2007 book that surveys his life's work in attempting to answer the titular question via ACT-R.

I'm moved to blog this because I was extremely impressed by (1) the synthesis of seemingly disparate phenomena (ACT-R is very consistent with a wide range of findings in cognitive psychology), and (2) how well his theories map onto findings from neuroscience. This book contains the most convincing model of human cognition I know of, but it is spread out across several chapters and compartmentalized in such a way that I feel I can unbox everything and tie it all together here in a more readily intelligible, coarse-grained fashion. It really is amazing, but I understand if you don't want to sit here and read a whole long synopsis. For this reason, I will now post verbatim a summary given by Anderson at the end of the book (though before he talks about consciousness), so that you can make an informed decision about whether to read further.
1. The answer [to the title question] takes the form of a cognitive architecture—that is, the specification of the structure of the brain at a level of abstraction that explains how it achieves the function of the mind.
2. For reasons of efficiency of neural computation, the human
cognitive architecture takes the form of a set of largely inde-
pendent modules (e.g., figure 2.2) associated with different
brain regions.
3. Human identity is achieved through a declarative memory
module that, moment by moment, attempts to give each person
the most appropriate possible window into his or her past.
4. The various modules are coordinated by a central production
system that strives to develop a set of productions that will give
the most adaptive response to any state of the modules.
5. The human mind evolved out of the primate mind by achieving
the ability to exercise abstract control over cognition and the
ability to process complex relational patterns.

The Modular Nature of Mind and Brain

The function of a cognitive architecture, according to Anderson, is "to find a specification of the structure of the brain that explains how it achieves the function of the mind." He argues that connectionist models of cognition will never be able to completely account for human cognition as a whole:
"This is because the human mind is not just the sum of core competences such as memory, or categorization, or reasoning. It is about how all these pieces and other pieces work together to produce cognition. All the pieces might be adapted to the regularities in the world, but understanding their individual adaptations does not address how they are put together."
Though many cognitive phenomena are certainly connectionist in nature, there is also no question that the brain is more than a uniform network of individual neurons. Much in the way that a cell is functionality partitioned into organelles, or that an organism comprises interconnected organ systems that each carry out characteristic tasks, the brain too has modularized certain functions, as evidenced by unique regions of neural anatomy associated with the performance of different tasks. The brain isn't just one huge undifferentiated mass! Neurons that perform related computations occur close together by reason of parsimony: the further apart they are, the longer it would take for them to communicate. Thus, computation in the brain is local and parallel; different regions perform different functions in the service of cognition, though at a lower level the functionality of any given brain region is connectionist in nature. Indeed, almost all systems whose design is meant to achieve a function show this kind of hierarchical organization (Simon, 1962).

If the brain devotes local regions to certain functions, this implies that we should be able to use brain-scanning procedures to find regions that reflect specific activities. The ACT-R cognitive architecture proposes 8 basic modules, and has mapped them onto specific brain regions through a series of fMRI experiments.
The eight modules (four peripheral and four central), plus their associated brain regions, are as follows: (1) Visual - processing of attended information in the fusiform gyrus; (2) Aural - secondary auditory cortex; (3) Manual -  hand motor/sensory region of central sulcus; (4) Vocal - face/tongue motor/sensory region of central sulcus; (5) Imaginal - mental/spatial representation area in posterior parietal cortex; (6) Declarative - memory storage/retrieval operations in prefrontal cortical areas; (7) Goal - cognition directed by anterior cingulate cortex; and (8) Procedural - integration, selection of cognition actions through the basal ganglia. A single fMRI study (Anderson et al., 2007) demonstrated the exercise of all of these modules and their associated brain regions. For our purposes, two of these modules are worth considering in more detail.

While the many regions of the brain do their own separate processing, they must act in a coordinated manner to achieve cognition. Thus, many regions of localized functionality are interconnected by tracts of neural fibers; particularly important are the connections between the cortex (the outermost region of the brain) and subcortical structures. One subcortical area in particular, the basal ganglia, is innervated by most of the cortex and plays a major role in controlling behavior through its actions on the thalamus. It marks a point of convergence across brain regions, compressing widely distributed information into what is effectively a single decision point. Thus, the basal ganglia is believed to be the main brain structure involved in action selection, or choosing which of many possible behaviors to perform in a given instance. Like their associated brain regions, the ACT-R modules must be able to communicate among each other, and they do so by placing information in small-capacity buffers associated with each of them. The procedural module plays the role of the basal ganglia by responding to patterns of information in these buffers and producing action. Though all modules are capable of independent parallel processing, they have to communicate via the procedural module, which can only execute a single rule/action at a time, thus forming a serial "central bottleneck" in overall processing.

So the basal ganglia plays the role of a "coordinating module". Appropriately, this region is evolutionarily older than the cortex and it occurs to some extent in all vertebrates. The other module I wanted to consider is the Goal module, which enables means-ends analysis. This is a task that is more uniquely human; it requires that one be able to disengage from what one wants (the goal, or "end") in order to focus on something else (the "means"). Some researchers (Papineau, 2001) assert that this is a uniquely human capability.

So, where are we at? The human mind is thought to be partitioned into specific information-processing functions, and thankfully neuroanatomy appears to be cut along similar joints, with specific brain regions devoted to different functions and interconnections that provide for coordination among these functions. Having positive a cognitive architecture based on interacting modules, Anderson turns next to the nitty-gritty of learning and memory.

Learning and Memory in ACT-R

Above, I mentioned a "Declarative" module as being among the central modules posited by ACT-R. Anderson's fundamental claim is that "declarative memory tries to give us, moment by moment, the most appropriate possible window into our past," and "this window into our past gives us our identities."

He assumes the well-documented distinction between declarative learning, or learning of "facts" and procedural learning (skill acquisition). He doesn't, however, make Tulving's (1972) episodic/declarative distinction; instead he considers both explicitly learned in a given context, with the difference being that the "declarative" memory (such as "Lincoln was a U.S. president") has been encountered in so many subsequent contexts that we no longer have access to the context in which it was originally learned. Declarative memories can be strengthened, or made more available, by mere exposure.

In addition to the formation and strengthening of declarative memories, there is also procedural learning and subsequent conditioning of these actions. An example he gives is typing: we all know how to type, but we would have a difficult time if asked to give the location of a certain key on a keyboard (without using our fingers as an aid or relying on a common mnemonic like "the home row" or "qwerty"). Conditioning is how all animals learn that certain actions are more effective in certain situations through experience; these can be procedural actions or innate tendencies. Procedural knowledge is associated with the basal ganglia and will be discussed in greater detail below; for now, we will stay with declarative learning.

Interestingly, there are two ways of acquiring declarative memories. This can be illustrated by anterograde amnesiacs like H.M., who, despite the loss of the hippocampus (and the ability thereby to form new memories), was able to learn about famous people such as John F. Kennedy and others who became famous after his surgery. Recent researchers have postulated two different learning systems: while the hippocampus is known to subserve most declarative learning, other brain structures can slowly acquire such memories through repetition (presumably how H.M. came to know about famous people). Furthermore, through rehearsal, memories can be slowly transferred from the hippocampus to neocortical regions, explaining why those with a damaged or missing hippocampus can still access older memories (which are presumed to have undergone such transfer). So, while the hippocampus limits the capacity of declarative memory, it does not limit all learning.

I've long been confused about the relative finitude of memory, but Anderson makes a strong case for there being definite limits on the size of declarative memory. Beyond physical limits of sheer size and metabolic costs, he makes the interesting claim that the very flexibility of our memory-search ability derives from it being strategically limited, "throwing out" memories that are unlikely to be needed: "declarative memory, faced with limited capacity, is in effect constantly discarding memories that have outlived their usefulness".

Alongside Lael Schooler, Anderson (1991) researched the fundamental mechanisms of declarative memory. They found that if a memory has not been retrieved in a while, it becomes increasingly unlikely that it will be needed in the future. Indeed, there is a simple relationship between how likely a memory would be needed on a given day and how long it had been (t) since the memory was last used:
Odds needed = At-d

Where A is just a constant and d is the decay rate. Each time a memory was accessed, it added an increment to the odds that it would be needed again, with these increments all decaying according to a power function. Thus, if an item occurred n times, the odds of it appearing again is

Odds = ∑nk=1  Atk-d

Where tk is the time since the kth practice of an item. Thus, the past history of memory use predicts the odds that the memory will be needed. But the context of the current situation is involved as well. It turns out that memory availability is adjusted as a function of context; e.g., you will have an easier time remembering, say, your locker combination in the locker room than you would if someone were to randomly ask you for it elsewhere (Schooler and Anderson, 1997). Thus, human memory reflects the statistics of the environment and performs a triage on memories, devoting its limited resources to those that are most likely to be needed. How is this fact realized in ACT-R?

In ACT-R, the "past" that is available in the form of memories consists of the information that existed in the buffers of various modules. At any given moment, countless things are impinging on the human sensorium, of which we only remember a very small fraction. For instance, ambient sounds or things in the visual periphery certainly undergo processing in various brain regions, but they seldom attended to and thus often never make it into buffers. The system is "aware" only of the chunks information in the various buffers, and these chunks get stored in declarative memory. These chunks have activation values that govern the speed and success of their retrieval. Specifically, a given memory has an inherent, base-level activation, plus its strength of association to elements in the present context.

Since the odds of needing a memory can be considered the sum of a quantity that reflects the past history of that memory and the present context, we can represent this in Bayesian terms as

 log[prior(i)] + ∑(j∈C)log[likelihood(j|i)] = log[posterior(i|C)]

Where prior(i) is the base-level activation, or the prior odds that memory i would be needed based on factors such as recency/frequency of use, likelihood(j|i) is the likelihood ratio that element j would be part of the context given that memory i is needed (reflecting strength of association to the current context), and posterior(i|C) is the updated odds that memory i will be needed in contex C.

I'll give the basic ACT-R memory equations without going into them much further. The main point is that memory is responding to two statistical effects in the environment: (1) the more often a memory is retrieved, the more likely it is to be retrieved in the future. This produces a practice effect and is reflected in ACT-R's base-level activation. Secondly, (2) the more memories associated with a particular element, the worse a predictor the element is of any particular memory. This is reflected in the strengths of association in ACT-R, and produces the "fan" effect. The "fan" refers to the number of connections to a given element; increasing the sheer number of connections will decrease the strength of association between the element and any one of its connections. This is because when an element is associated with more memories, its appearance becomes a poorer predictor of any specific fact.

These results have been shown to affect all of our memories. In experimental illustration of this, Peterson and Potts (1982) had participants study 1 or 4 true facts about famous historical figures that they did not previously know, such as that Beethoven never married. Two weeks later, participants were tested on memory for three kinds of facts: (1) new facts they had learned about historical figures as part of the experiment, (2) known facts that they knew about the historical figures before the experiment (eg, Beethoven was a musician), and (3) false facts that they had not learned for the experiment and that should be recognizable as very unlikely (Beethoven was an famous athlete). Participants were shown these types of statements and had to rate them as true or false, and their speed in doing so was recorded. First, it was found that the facts they knew before the experiment were recognized much more quickly than those they learned for the experiment, reflecting the greater practice and base-level activation of the prior facts. More importantly, the number of facts they had learned for the experiment (1 vs. 4) affected BOTH new and prior facts: participants who learned 4 new facts made slower judgements for both well-known and newly-learned facts, while those who learned just 1 new fact were faster on both new and prior facts. Anderson writes:
From the perspective of the task facing declarative memory—making
most available those facts that are most likely to be useful—these results make perfect sense. The already known facts have been used many times in the past, and at delay of two weeks they are likely the ones needed, so the base-level activation works to make them most active. On the other hand, the more things one knows about an individual, the less likely any one fact will be, so they cannot be all made as active. The activation equations in table 3.2 capture these relationships.
This relationship is also borne out in fMRI research. The greater the activation of a memory, the less time/effort it will take to retrieve it; thus, higher activation should map onto weaker fMRI response. Using a fan-effect paradigm, it was found that greater fan (more connections to a single memory) resulted in decreased activation and therefore stronger fMRI respones (Sohn 2003, 2005).

Anderson goes on in this chapter to discuss how we often choose actions and make decisions based on our memories of similar past actions/decisions and the outcomes that they produced. Here, we rely on memories rather than reasoning on the basis of general principles. Sometimes we have general principles to reason from, while other times it's far easier to recall and act. This kind of instance-based reasoning may be far more common than has been traditionally thought.

The Adaptive Control of Thought

Given all of the above, we know how important a flexible declarative memory is to our ability to adapt to a changing environment; but once the relevant information has been retrieved, we have to act on it, using it to make inferences or predictions. This often requires intensive, deliberative processing which is not appropriate when we have to act rapidly in stressful situations. Indeed, to the extent that one can anticipate how knowledge will be used, it makes sense to prepackage the application of that knowledge in a way that can be executed without planning. It turns out that there is a process by which frequently useful computations are identified and cached as cognitive reactions that can be elicited directly by the situation, bypassing laborious deliberation. Thus, a balance must be struck between immediate reaction and deliberative reflection, a sort of dual processing reminiscent of Kahneman's "Thinking Fast and Slow." This is the way Anderson conceptualizes learning: a process of moving from intentional thinking and remembering (hippocampal/cortical) to more automatic reactions (basal ganglia).

But such an equal embrace of thought and action has not always characterized cognitive science; in fact, this very distinction marked the transition in psychology from the "behaviorist" to the "cognitive" era. This shift is very visible in the debate between Tolman and Hull about the relative roles of mental reflection and mechanistic action in producing behavior. To illustrate the struggle between thought and action in the mind, Anderson has us consider the Stroop task, where you are instructed to quickly report the font color while reading a list like red yellow orange green blue black etc. This task always takes slightly longer than simply reporting the color of non-words; Anderson points out that "this conflict basically involves the battle between Hull’s stimulus-response associations (the urge to say the word) and Tolman’s goal-directed processing (the requirement to comply with instructions)."

Anderson argues that 3 brain systems are especially relevant in achieving a balance between thought and action: the basal ganglia are responsible for the acquisition and application of "procedures", or Hull's automatic reactions; the hippocampal and prefrontal regions are responsible for storage and retrieval of declarative information, or Tolman's expectancies; and the anterior cingulate cortex (ACC) for exercising control in the selection of context-appropriate behavior. Note that these respectively correspond to the procedural module, the declarative module, and the goal module.

Declarative retrieval and of information during decision-making is very time and resource intensive; it would be sensible if our brains had a way of "hard-coding" frequently-used behaviors/actions so that we could respond more automatically to familiar situations. Fortunately, it appears they do just that! For example, Hikosaka et al. (1999) showed monkeys a sequence of 4x4 grids in which two cells were lit up, and the monkeys had to select them in the correct order. The monkeys practiced such sets over the course of several months, and telling differences emerged between performance during the early months and later months. Early on, the monkeys performed the same regardless of what order the grids were shown in, or of which hand they used; however, after months of practice, they had become much faster at completing the task but could not go out of order and could only use their favored hand to input the answer. Thus, it seemed that the monkeys had switched from a flexible declarative representation of the task to a classic stimulus-response representation. Hikosaka et al. examined the brains of monkeys performing the task in order to compare activity in the early vs. later months. As expected, the task activated prefrontal regions early on, but after much practice the task primarily produced activity basal ganglia structures, which are thought to display a variant of reinforcement learning. Furthermore, temporarily inactivating basal ganglia structures disrupted only the highly practiced sequences (not newly learned sequences).

The basal ganglia, then, is involved in producing automatic responses to stimuli. Indeed, it seems to display a variant of reinforcement learning, where a behavior followed by a "satisfying state of affairs" will increase in frequency (Thorndike's law of effect). The hippocampus is associated with Hebbian learning, where repeated occurrences of stimuli and response together serve to strengthen the connection (Thorndike's law of exercise); this is merely a function of temporal contiguity and does not depend on the consequences of the behavior. The basal ganglia is involved in a dopamine-mediated process that learns to recognize favorable patterns of activity in the cortex (Houk and Wise, 1995). That is, dopamine neurons provide information to the basal ganglia about how rewarding a behavior was, if it was more rewarding than expected, etc. Importantly, an element of time-travel is involved, because the rewards strengthen the salience of reward-producing contexual patterns. In humans, the basal ganglia (specifically the striatum) has been found to respond differentially to reward and punishment, the magnitude of the reward/punishment, and the difference between expected and recieved reward/punishment (Delgado et al. 2003). This was all very refreshing to me. Classical and operant conditioning are often presented in psychology classrooms as museum curiosities or animal training procedures, when in fact they apply equally well to human learning.

I wanted to share one final experimental demonstration of the difference between learning in the hippocampus versus the basal ganglia. This one involves a rat maze-learning paradigm; imagine a maze shaped like a plus sign (+); rats always enter on the same side, say the west side. Rats are trained to go to food housed in the south arm. What will rats do if they are put in the maze on the east side? Have they learned the spatial location of the food, or have they merely learned a right-turning behavior? If the former is true, they should turn down the correct arm of the maze to find the food; if they latter is true, their response will lead them down the wrong arm. Early results yielded no clear choice pattern (Restle, 1957). However, Packard and McGaugh (1996) trained all rats on the maze and then gave them injections that temporarily impaired either their hippocampus or their basal ganglia (specifically, the caudate). As you might expect, the rats with selective hippocampal impairment performed the right-turning response and ended up in the wrong arm of the maze, while rats with impediments to the basal ganglia chose the correct arm, presumably because their intact hippocampus contained the correct spatial "place-learning" representation. A convincing follow-up study by Packard (1999) produced the same pattern of results, but this time by using memory-enhancing agents applied selectively to the hippocampus or the caudate. This time, rats with hippocampal enhancements displayed behavior consistent with place-learning (they chose the correct arm), while rats with enhanced caudates relied on a right-turn response and chose the incorrect arm.

But where do these stimulus-response associations come from? In ACT-R, they are called "productions" or "production rules" -- when a situation arises for which the system does not already have rules, information must be retrieved from declarative memory and must be processed using more basic production rules. This could entail retrieving a similar prior experience upon which to base present actions or retrieving general principles and reasoning from them. In such a situation,
"the first production makes a retrieval request for some declarative information, that information is retrieved, and the next production harvests that retrieval and acts upon it. The compiled production eliminates that retrieval step and builds a production specific to the information retrieved. This is the process by which the system moves from deliberation to action. Each time a new production of this kind is created, another little piece of deliberation is dropped out in the interest of efficient execution."
However, this newly formed production requires multiple repetitions for it to acquire enough strength to be applicable in new situations. Such rules are learned slowly, consistent with the view that procedural memories are acquired gradually. This measure of strength is often called a rule's "utility" since it is a measure of the value of the rule; when a situation arises where multiple rules apply, the rule with the highest utility is chosen; further, rewarding consequences following the use of a rule serve to increase that rule's utility. When a new rule is first created, its utility is zero and thus it is extremely unlikely that it will "fire". However, each time this rule is recreated its utility is increased. Anderson gives an excellent example using children's learning of subtraction rules. In the interest of time I won't go into it here, other than to say that it accounts for the most common bug in learning to subtract two multi-digit numbers: instead of always subtracting the bottom number from the top number, the buggy rule children often use is to subtract the bigger from the smaller, regardless of which is on top. This rule is so persistent because half of the time, it produces the correct outcome and thus the same reward as the more limiting bottom-from-top rule. ACT-R is used to model the acquisition of the correct rule, and I found it very compelling.

This general learning process is seen clearly in skill learning: as one becomes more skillful (say, in riding a bike), there will be a decrease in the involvement of the more "cognitive" cortical regions and an increase in the involvement of the more "stimulus-response" posterior regions. Here's Anderson's summary:
"Learning can be conceptualized as a process of moving from thought-
ful reflection (hippocampus, prefrontal cortex) to automatic reaction
(basal ganglia). The module responsible for learning of this kind is the
procedural module (or production system). I offer the procedural mod-
ule as an explanation for behavior that embraces both Hull’s reactions
and Tolman’s reflections and provides a mechanism for the postulated
learning link between them. Through production compilation, thought-
ful behaviors become automatized; through utility learning, behavior
is modified to become adaptive. When combined with the declarative
memory module discussed in chapter 3, the production system provides a mechanism by which knowledge is used to make behavior more flexible and efficient."
Thus, an important part of cognition is the accumulation of production rules in long-term memory, which can then become activated by the contents of working memory, which can be composed into more complex production-rule chains when a particular problem is solved, the result of which can be cached and, if used above some some frequency threshold, will become a production rule in its own right.

Uniquely Human Learning

Anderson points out that his (and my) discussion up to this point has actually concerned primate learning; nothing so far has been unique to humans. In chapter 5, he discusses learning from verbal directions and worked-out examples. He also recognizes the role of individual discovery in the learning process, but criticizes the recent trend towards pure "discovery" learning in education:
" ...a third way to learn is by discovery and invention. Cultural artifacts such as algebra came into being because of such a process. Some constructivist mathematics educators advocate having children learn in the same manner (e.g., Cobb et al., 1992). In the extreme, it is a very inefficient way to learn algebra or any other cultural artifact... However, when one looks in detail at what happens in the process of learning from instruction and example, one frequently finds many minidiscoveries being made as students try to make sense of the instruction they are receiving and their experience in applying that instruction. Learning by discovery probably plays a more important role as a normal part of learning through social transmission (i.e., directions and examples) than it does as a solo means of learning."
Anderson goes on to discuss how human cognition can support a uniquely human skill: learning algebra from verbal directions and examples. He uses ACT-R to model algebra learning and to help point the way toward what is special about human cognition. He ends up describing three such features in detail: the potential for abstract control of cognition, the capacity for advanced pattern matching, and the metacognitive ability to reason about cognitive states.

The first is likely mediated by the anterior cingulate cortex (ACC), a structure involved in controlling behavior, which is especially active when people have to direct their behavior in ways that violate typical response tendencies. Interestingly, the ACC has undergone recent evolutionary changes found only in humans. Recall that this structure was the one associated with the goal buffer, which holds control elements. The idea is that the ACC allows us to maintain abstract control states which let us choose different actions when all the other buffers are in identical states. The second feature requires dynamic pattern matching, which allows for processing complex relational structures, as seen in analogical processing. It all gets pretty detailed and I won't go into it here. Instead I'll just quote the end of the chapter:
Dynamic pattern matching and recursive representations are connected. Dynamic pattern matching is only useful in a system that has powerful, interlinked representations. Processing recursive representations can be much easier with dynamic pattern matching. The human brain is expanded over that of other primates, and it is not just a matter of more brain. There are new prefrontal and parietal regions, and in the case of some regions such as the ACC, there are new kinds of cells. While brain lateralization is also a common feature of many species, its connection with language seems unique (Halpern et al., 2005), and Marcus’s second feature is strongly motivated by considerations of language processing. So, it seems pretty clear that there have been some changes to the structure of the human brain that enable the unique functions of human cognition.

The Question of Consciousness

It isn't really fair to talk about this here, because I have only given you a flavor for the main arguments presented in the book, and it is upon this foundation that his discussion of consciousness is founded. It requires an intimate understanding of ACT-R, and I don't think I've done a good enough job conveying that understanding in the present post. Still, I'll leave you with his thoughts on the subject, which he gives only grudgingly (preferring to "leave the philosopher's domain to the philosopher"):

In 2003, we noted that in ACT-R consciousness has an obvious mapping to the buffers that are associated with the modules. The contents of consciousness are the contents of these buffers, and conscious activity corresponds to the manipulation of the contents of these buffers by production rules. The information in the buffers is the information that is made available for general processing and is stored in declarative memory. ACT-R models can generate introspective reports by describing the contents of these buffers. In 2003 we did not think this was much of an answer and gave ACT-R low marks on this  dimension. I have subsequently come to the conclusion that this is indeed what consciousness is and that running ACT-R models are conscious. They may not be conscious in the same sense as humans, but this is probably because ACT-R gives a rather incomplete picture of the buffers that are available in the human system.

He immediately notes that this is "not a particularly novel interpretation of consciousness" and that it is essentially "the ACT-R realization of the global workspace theory of consciousness (Baars, 1988; Dehaene & Naccache, 2001)
These authors, Dehaene and Changeux (2004), summarize the view as follows:
We postulate the existence of a distinct set of cortical “workspace” neurons characterized by their ability to send and receive projections to many distant areas through long-range excitatory axons. These neurons therefore no longer obey a principle of local, encapsulated connectivity, but rather break the modularity of the cortex by allowing many different processors to exchange information in a global and flexible manner. Information, which is encoded in workspace neurons, can be quickly made available to many brain systems, in particular the motor and speech-production processors for overt behavioral report. We hypothesize that the entry of inputs into this global workspace constitutes the neural basis of access to consciousness. (p. 1147)
He is totally on-board with rejecting all "Cartesian theater" interpretations--the idea that there has to be something more to consciousness, some inner homunculus that watches our thoughts flit by-- and he seems to agree pretty completely with Dennett (1993). He finishing with the following:
 If we resist the temptation to believe in a hard problem of consciousness, we can appreciate how consciousness is the solution to the fundamental problem of achieving the mind in the brain. As noted in chapter 2, efficiency considerations drive the brain to try to achieve as much of its computation as possible locally in nearly encapsulated modules. However, the functionality of the mind demands communication among these modules, and to do this, some information must be made globally available. The purpose of the buffers in ACT-R is to create this global access. The contents of these buffers will create an information trail that can be reported and reflected upon. As in the last example in chapter 5, adaptive cognition sometimes requires reflection on this information trail. Thus, consciousness is the manifestation of the solution to the need for global coordination among modules. It is a trademark consequence of the architecture in figure 2.2. That being said, chapters 1–5 develop this architecture with only oblique references to consciousness. This is because the information processing associated with consciousness is already described by other terms of the theory. It still is not clear to me how invoking the concept of consciousness adds to the understanding of the human mind, but taking a coherent reading of the term consciousness, I am willing to declare ACT-R conscious.

Wednesday, July 1, 2015

I Finally Read "A New Kind of Science"

This book made me think new thoughts; this is rare, so I am posting about it.
If you read nothing else in the post, read the end.

"A New Kind of Science" is a 13-year-old book preceded, and regrettably often prejudged, by its reputation. Many of the criticisms that have come to define the work are valid, so let's get that part out in the open. The book can be read in its entirety here; it is enormous, both physically (~1,200 pages) and in scope, which has led to a limited and specialized readership lodging many legitimate, though mostly technical, complaints. To make matters worse, Wolfram comes across as rather smug and boastful, taking for granted the revolutionary impact of his work, staking claims to originality that are often incorrect, and failing to adequately cite his ideological predecessors in the main body of the text. These are legitimate concerns and egregious omissions to be sure.

However, the book itself was written to be accessible to anyone with basic knowledge of math/science, and I feel it gains so much for being simplistic and frankly written in this way. Furthermore, the biggest ideas in the book, even if not completely original or 100% convincing as-is, are still as beautiful and important as they are currently underappreciated. Even if he cannot claim unique ownership of them all, Wolfram has done an heroic job explicating these ideas for the lay reader and his book has vastly increased their popular visibility. But I fear that many may be missing out on a thoroughly enjoyable, philosophically insightful book simply on the basis of some  overreaching claims and some rather technical flaws; I get the sinking feeling it's going the way of Atlas Shrugged—a big book that's cool to dismiss without ever having read.

I'm not going to get into the specific criticisms; suffice it to say that there are issues with the book, though Wolfram has gone to some trouble to defend his positions. But this notorious reception, coupled the fact that the book weighs in at almost 6 pounds, had kept me and surely many others from ever giving it a proper chance. This post is not meant to be an apology, and neither is it intended to be a formal book review. Rather, I am going to show you several things that I took away from my reading of it that I feel very grateful for, regardless of the extent to which they represent any sort of paradigm shift, or even anything new to human inquiry. Lots of these ideas were very new to me, and they were presented so well that I feel I have ultimately gained a new perspective on many issues, including life itself, which I have been trying sedulously for years to better understand.

To attempt to write a general review this book would be quite difficult, for its arguments depend so much upon pictures (of which there are more than 1000!), careful explanations, and repeated examples to build up intuition for how very simple rules can produce complex behavior, and how this fact plausibly accounts for many phenomena in the natural and physical world (indeed, perhaps the universe itself). Wolfram uses this intuition to convey compelling explanations of space and time, experience and causation, thinking, randomness, free will, evolution, the Second Law of thermodynamics, incompleteness and inconsistency in axiom systems, and much more. I will be talking about most of the non-math/physicsy stuff in this post, because most of it's beyond my ken.

This will make a little more sense later on

To begin with, the first 5 to 8 chapters are nothing if not eye-opening; they could and should be read by all high-schoolers who are interested in science, and unless you are a scientist yourself I guarantee you will gain many new insights into some fundamental issues. Some of the physics (ch. 9) got a little heavy for me, but this may well be the most interesting part of the book for many. The final chapter (12) was what did it for me personally.

In this long last chapter, the main thesis of the book is driven home. It is as follows: the best (and indeed, perhaps the only) way to understand many systems in nature (and indeed, perhaps nature itself) is to think in terms of simple programs instead of mathematical equations; that is, to view processes in nature as performing rule-based computations. Simple computer programs can explain, or at least mimic, natural phenomena that have so far eluded mathematical models such as differential equations; Wolfram argues that nature is ultimately inexplicable by these traditional methods.

He demonstrates how very simple programs can produce complexity and randomness; he argues that because simple programs must be ubiquitous in the natural world, they are responsible for the complexity and randomness we observe in natural systems. He shows how idealized model programs like cellular automata can mimic in shockingly exact detail the behavior of phenomena which science has only tenuously been able to describe: crystal growth (e.g., snowflakes), fluid turbulence, the path of a fracture when materials break, biological development, plant structures, pigmentation patterns...thus indicating that such simple processes likely underlie much of what we observe.

Indeed, he makes a case (originally postulated by Konrad Zuse) that the universe itself is fundamentally rule based, and essentially one big ongoing computation of which everything is a part. It gets a little hairy, but in chapter 9 Wolfram discusses how the concept of causal networks can be used to explain how space, time, elementary particles, motion, gravity, relativity, and quantum phenomena all arise. Indeed, he argues that causal networks can represent everything that can be observed, and that all is defined in terms of their connections. This is predicated on the belief that there are no continuous values in nature; that is to say, that nature is fundamentally discrete. There were a lot of intriguing ideas here, but I cannot go into them all right now. There does seem to be a reasonable case to be made for some kind of of digital physics. I am way out of my league here though, so I'll stop. Check out that wikipedia article!

Cellular automata and other related easy-to-follow rule-based systems are used to demonstrate, or at least to hint at, most of these claims. If you haven't seen these before, check out that link: it takes you to Wolfram's own one-page summary of how these things work. In fact, I'm going to cut-and-paste most of it below. But here's a brief description: imagine of a row of cells that can be either black or white. You start with some initial combination of black and/or white cells in this row; to get the next row, you apply a set of rules to the original cells which tells you what color cells in the next row should be based on the colors of the original cells above. The rules that determine the color of a new cell are based on the colors of three cells: the cell immediately above it and the cells to the immediate right and left of the one above it. Thus, the color of any given cell is affected only by itself and its immediate neighbors. Simple enough, but those neighbors are in turn governed by their neighbors, and those neighbors by their neighbors, etc, so that the whole thing ends up being highly interconnected.  When you repeatedly apply a given rule and step back to observe the collective behavior of all the cells, large-scale patterns can emerge. You often get simple repetitive behavior (like the top picture below) or nested patterns (second picture below). However, sometimes you find behavior that is random, or some mixture of random noise with moving structures (last picture below).

Look at the picture just above; the left side shows certain regularities, but the right side exhibits random behavior (and has indeed been used for practical random number generators and encryption purposes). How might one predict the state of this system after it has evolved for a given number of time-steps? (This is an important "exercise left to the reader" so think about it before reading on).

The 'take-home' here is that sometimes simple rules lead to behavior that is complex and random, and the lack of regularities in these systems defy any short description using mathematical formulas. The only way to know how that sucker right there is going to behave in 1,000,000,000 steps is to run it and find out. 

If you like looking at pictures like this one, you should definitely check out the book. I read it digitally but I ordered a physical copy as soon as I finished because man, what a terrific coffee-table book this thing makes!

Computational universality

Now here's where things got really interesting for me. Unless you have studied computer science (which I honestly really haven't), you might be surprised to find out that certain combinations of rules, like those shown above, can result in systems that are capable of performing any possible computation and emulating any other system or computer program (which I honestly kind of was). Indeed, the computer you are reading this on right now has this capability. Hell, your microwave probably has this capability; given enough memory, it could run any program or calculate any function provided the function is able to be computed at all.

This idea, called universal computation, was developed by Alan Turing in the 1930s: a system is said to be "universal" or "Turing complete" if it is able to perform any computation. If a system is universal, it must be able to emulate any other system, and it must be able to produce behavior that is as complex as that of any other system; knowing that a system is universal implies that the system can produce behavior that is arbitrarily complex.

When studying the 256 rule-sets that generate the elementary cellular automata, Wolfram and his assistant Matthew Cook showed that a couple of them (rule 110 and relatives) could be made to perform any computation; that is, a couple of these extremely simple systems, among the most basic conceivable types of programs, were shown to be universal.

Rule 110 from a single black cell (16 steps; see 250 steps below)

In general, this itself is not new knowledge; von Neumann was the first to show that a cellular automaton could be a universal computer, and it was known that other simple devices could support universal computation. However, this was the simplest instantiation of universality yet discovered, and Wolfram uses this to argue that the phenomenon must indeed be quite more widespread than originally thought, and indeed very common in nature. While most basic sets of rules generate very simple behavior (like the first and second rules pictured above), past a certain threshold you get universality, where a system can emulate any other system by setting up the appropriate initial conditions, like rule 110:

How universality can actually be achieved with cellular automata in practice is described with great clarity in the book, but it would be too complicated to get into here. Pretty neat though! Wolfram goes on to show how universality is instantiated in Turing machines, cellular automata, register machines, and substitution systems, by showing how each one can be made to emulate the others by setting up appropriate initial conditions, despite great differences in their underlying structure.
"It implies that from a computational point of view a very wide variety of systems, with very different underlying structures, are at some level fundamentally equivalent...every single one of these systems is ultimately capable of exactly the same kinds of computations."
Any kind of system that is universal can perform the same computations; as soon as one gets past the threshold for universality, that's all. Things can't get more complex. It doesn't matter how complex the underlying rules are; one universal system is equivalent to any other, and adding more to its rules cannot have any fundamental effect. He goes on to say,
" general expectation is that more or less any system whose behavior is not somehow fundamentally repetitive or nested will in the end turn out to be universal."
This and related research led Wolfram to postulate his "new law of nature", the Principle of Computational Equivalence: that since universal computation means that one system can emulate any other system, all computing processes are equivalent in sophistication, and this universality is the upper limit on computational sophistication.
"No system could be constructed in our universe that is capable of more complex computations than any other universal system; no system can carry out computations that are more sophisticated than those carried out by a Turing machine or cellular automaton."
Another way of stating this is that there is a fundamental equivalence between many different kinds of processes, and that all processes which are not obviously simple can be viewed as computations of equivalent sophistication, whether they are man-made or spontaneously occurring. When we think of computations, we typically think of carrying out a series of rule-based steps to achieve a purpose, but computation in fact much broader, and as Wolfram would argue, all-encompassing. Thus, as in cellular automata, the process of any system evolving is itself a computation, even if its only function is to generate the behavior of the system. Thus, all processes in nature can be thought of as computations; the only difference is that "the rules such processes follow are defined not by some computer program that we as humans construct but rather by the basic laws of nature."

Wolfram goes on to suggest that any instance of complex behavior we observe is produced by a universal system.
"I suspect that in almost any case where we have seen complex behavior... it will eventualy be possible to show that there is universality. And indeed... I believe that in general there is a close connection between universality and the appearance of complex behavior."
"Essentially any piece of complex behavior that we see corresponds to a kind of lump of computation that is at some level equivalent."
He argues that this is why some things appear complex to us, while other things yield patterns or regularities that we can perceive, or which can be described by some some formal mathematical analysis:
"If one studies systems in nature it is inevitable that both the
evolution of the systems themselves and the methods of perception and analysis used to study them must be processes based on natural laws. But at least in the recent history of science it has normally been assumed that the evolution of typical systems in nature is somehow much less sophisticated a process than perception and analysis.

Yet what the Principle of Computational Equivalence now asserts is that this is not the case, and that once a rather low threshold has been reached, any real system must exhibit essentially the same level of computational sophistication. So this means that observers will tend to be computationally equivalent to the systems they observe— with the inevitable consequence that they will consider the behavior of such systems complex."
Thus, the reason things like turbulence in fluids or any other random-seeming phenomena appear complex to us is that we are computationally equivalent to these things. To really understand the implications of this idea, we need bring in the closely related idea of irreducibility.

Computational irreducibility

Wolfram claims that the main concern of science has been to find ways of predicting natural phenomena, so as to have some control/understanding of them. Instead of having to specify at each step how, say, a planet orbits a star, it is far better to derive a mathematical formula or model that allows you to determine the outcome of such systems with a minimum of computational effort. Sometimes, you can even find definite underlying rules for such systems which make prediction just a matter of applying these rules.

However, there are many common systems for which no traditional mathematical formulas have been found which can easily describe their behavior. And just because you know the underlying rules, there is often no way to know for sure how the system will ultimately behave, and it can take an irreducible amount of computation to actually do this. Imagine how you would try to predict the row of black and white cells after the rule-110 cellular automaton had run for, say, a trillion steps. There is simply no way to do this besides carrying out the full computation; no way to reduce the amount of computational effort that this would require. Thus,
"Whenever computational irreducibility exists in a system it means that in effect there can be no way to predict how the system will behave except by going through almost as many steps of computation as the evolution of the system itself.

...what leads to the phenomenon of computational irreducibility is that there is in fact always a fundamental competition between systems used to make predictions and systems whose behavior one tries to predict.

For if meaningful general predictions are to be possible, it must at some level be the case that the system making the predictions be able to outrun the system it is trying to predict. But for this to happen the system making the predictions must be able to perform more sophisticated computations than the system it is trying to predict."
This is because the system you are trying to predict and the methods you are using to make predictions are computationally equivalent; thus for many systems there is no general way to shortcut their process of evolution, and their behavior is therefore computationally irreducible. Unfortunately, there are many common systems whose behavior cannot ultimately be determined at all except for through direct simulation, and thus don't appear to yield to any mathematical short description. Wolfram argues that almost any universal system is irreducible, because nothing can systematically outrun a universal system. He gives the following thought experiment:
"For consider trying to outrun the evolution of a universal system. Since such a system can emulate any system, it can in particular emulate any system that is trying to outrun it. And from this it follows that nothing can systematically outrun the universal system. For any system that could would in effect also have to be able to outrun itself."
Since universality should be relatively common in natural systems, so too will computational irreducibility, making it impossible to predict the behavior of these systems. He argues that traditional science has always relied on computational irreducibility, and that "its whole idea of using mathematical formulas to describe behavior makes sense only when the behavior is computationally reducible. This seems to impose stark limits on traditional scientific inquiry, for it implies that it is impossible to find theories that will perfectly describe a complex system's behavior without arbitrarily much computational effort.

Free Will and Determinism

The section of the book uses the idea of computational irreducibility to demystify of the age-old problem of free will in a way I find quite satisfying, even beautiful. Humans, and indeed most other animals, seem to behave in ways that are free from obvious laws. We make minute-to-minute decisions about how to act that that do not seem fundamentally predictable.

Wolfram argues that this is because our behavior is computationally irreducible; the only way to work out how such a system will behave, or to predict its behavior, is to perform the computation. This lets us have our materialist/mechanistic cake and eat it: we can admit that our behavior essentially follows a set of underlying rules with our autonomy intact, because our rules produce complexities that are irreducible and hence unpredictable.

We know that animals as living systems follow many basic underlying rules— genes are expressed, enzymes catalyze biochemical pathways, cells divide—but we have also seen how even very basic rule-sets result in universality, complexity, and computational irreducibility of the system.
"This, I believe, that is the ultimate origin of the apparent freedom of human will. For even though all the components of our brains presumably follow definite laws, I strongly suspect that their overall behavior corresponds to an irreducible computation whose outcome can never in effect be found by reasonable laws."
The main criterion for freedom in a system seems to be that we cannot predict its behavior. For if we could, then the behavior of the system would thus be predetermined. Wolfram muses,
"For as we have seen many times in this book even systems with quite simple and definite underlying rules can produce behavior so complex that it seems free of obvious rules. And the crucial point is that this happens just through the intrinsic evolution of the system—without the need for any additional input from outside or from any sort of explicit source of randomness.

And I believe that it is this kind of intrinsic process—that we now know occurs in a vast range of systems—that is primarily responsible for the apparent freedom in the operation of our brains.

But this is not to say that everything that goes on in our brains has an intrinsic origin. Indeed, as a practical matter what usually seems to happen is that we receive external input that leads to some train of thought which continues for a while, but then dies out until we get more input. And often the actual form of this train of thought is influenced by memory we have developed from inputs in the past—making it not necessarily repeatable even with exactly the same input.

But it seems likely that the individual steps in each train of thought follow quite definite underlying rules. And the crucial point is then that I suspect that the computation performed by applying these rules is often sophisticated enough to be computationally irreducible—with the result that it must intrinsically produce behavior that seems to us free of obvious laws."

Intelligence in the Universe

Wolfram has a wonderful section about intelligence in the universe, but this post is quickly becoming quite long so I will stick to my highlights. Definitely check it out if what I say here interests you.

Here he poignantly discusses how "intelligence" and "life" are difficult to define, and how many features of commonly given definitions of intelligence (learning and memory, communication, adaptation to complex situations, handling abstraction) and life (spontaneous movement/response to stimuli, self-organization from disorganized material, reproduction) are in fact present in much simple systems that we would not describe as intelligent or alive.
"And in fact I expect that in the end the only way we would unquestionably view a system as being an example of life is if we found that it shared many specific details with life on Earth."
Discussing extraterrestrial intelligence, he introduced me to an idea that is probably a well-known science fiction trope, but one that genuinely surprised me. He talks about how earth is bombarded with radio signals from around our galaxy and beyond, but that these signals seem to be completely random noise, and thus they are assumed to be just side effects of some physical process. But, he notices, this very lack of regularities in the signal could actually be a sign of some kind of extraterrestrial intelligence: "For any such regularity represents in a sense a redundancy or inefficiency that can be removed by the sender and receiver both using appropriate data compression." If this doesn't make sense to you, then you will probably also enjoy his section on data compression and reducibility. The whole book is really worth taking the time to read!

An Incredible Ending

I'm going to quote the last few paragraphs of the book in full, because they are extremely beautiful to me and there is no way I could do them justice. If you read nothing else in this blog post, read this. Feeling the full intensity of its impact/import really depends on one having read the previous like, 800 pages, and have understood the main arguments in them, so if you are planning to read the book in its entirety you might save this part until then for greatest effect. Still, if this is the only thing you ever read by Stephen Wolfram, I think it should be this. It is a good stylistic representation of the book (the short sentences, the lucid writing, the hubris) and it is the ultimate statement of the work's conclusions. Fair warning: much of it is going to sound absolutely outrageous if you haven't read the book, and especially if you haven't read the parts of this post about universality and computational reducibility. In fact, even still it sounds kind of preposterous!

But having been preoccupied with these questions about life for many years now, this passage resonated with me deeply and immediately and I am still reeling from it. Though I am not completely convinced (though are we ever, of anything?), the ideas summarized herein constitute, at least for me personally, a singularly compelling theory of existence, of nature, of life... of everything. Granted, I am taking a lot on faith for now, but I know I will have occasion to return to these thoughts time and time again as they percolate across my lifetime; indeed, it is largely for this reason that I took the time to write this post. Well, here it is; as elsewhere in the quoted material, any emphasis is mine:

"It would be most satisfying if science were to prove that we as humans are in some fundamental way special, and above everything else in the universe. But if one looks at the history of science many of its greatest advances have come precisely from identifying ways in which we are not special—for this is what allows science to make ever more general statements about the universe and the things in it.

Four centuries ago we learned for example that our planet does not lie at a special position in the universe. A century and a half ago we learned that there was nothing very special about the origin of our species. And over the past century we have learned that there is nothing special about our various physical, chemical and other constituents.

Yet in Western thought there is still a strong belief that there must be something fundamentally special about us. And nowadays the most common assumption is that it must have to do with the level of intelligence or complexity that we exhibit. But building on what I have discovered in this book, the Principle of Computational Equivalence now makes the fairly dramatic statement that even in these ways there is nothing fundamentally special about us.

For if one thinks in computational terms the issue is essentially whether we somehow show a specially high level of computational sophistication. Yet the Principle of Computational Equivalence asserts that almost any system whose behavior is not obviously simple will tend to be exactly equivalent in its computational sophistication.

So this means that there is in the end no difference between the level of computational sophistication that is achieved by humans and by all sorts of other systems in nature and elsewhere. For my discoveries imply that whether the underlying system is a human brain, a turbulent fluid, or a cellular automaton, the behavior it exhibits will correspond to a computation of equivalent sophistication.

And while from the point of view of modern intellectual thinking this may come as quite a shock, it is perhaps not so surprising at the level of everyday experience. For there are certainly many systems in nature whose behavior is complex enough that we often describe it in human terms. And indeed in early human thinking it is very common to encounter the idea of animism: that systems with complex behavior in nature must be driven by the same kind of essential spirit as humans.

But for thousands of years this has been seen as naive and counter to progress in science. Yet now essentially this idea—viewed in computational terms through the discoveries in this book—emerges as crucial. For as I discussed earlier in this chapter, it is the computational equivalence of us as observers to the systems in nature that we observe that makes these systems seem to us so complex and unpredictable.

And while in the past it was often assumed that such complexity must somehow be special to systems in nature, what my discoveries and the Principle of Computational Equivalence now show is that in fact it is vastly more general. For what we have seen in this book is that even when their underlying rules are almost as simple as possible, abstract systems like cellular automata can achieve exactly the same level of computational sophistication as anything else.

It is perhaps a little humbling to discover that we as humans are in effect computationally no more capable than cellular automata with very simple rules. But the Principle of Computational Equivalence also implies that the same is ultimately true of our whole universe.

So while science has often made it seem that we as humans are somehow insignificant compared to the universe, the Principle of Computational Equivalence now shows that in a certain sense we are at the same level as it is. For the principle implies that what goes on inside us can ultimately achieve just the same level of computational sophistication as our whole universe.

But while science has in the past shown that in many ways there is nothing special about us as humans, the very success of science has tended to give us the idea that with our intelligence we are in some way above the universe. Yet now the Principle of Computational Equivalence implies that the computational sophistication of our intelligence should in a sense be shared by many parts of our universe—an idea that perhaps seems more familiar from religion than science.

Particularly with all the successes of science, there has been a great desire to capture the essence of the human condition in abstract scientific terms. And this has become all the more relevant as its replication with technology begins to seem realistic. But what the Principle of Computational Equivalence suggests is that abstract descriptions will never ultimately distinguish us from all sorts of other systems in nature and elsewhere. And what this means is that in a sense there can be no abstract basic science of the human condition—only something that involves all sorts of specific details of humans and their history.

So while we might have imagined that science would eventually show us how to rise above all our human details what we now see is that in fact these details are in effect the only important thing about us.

And indeed at some level it is the Principle of Computational Equivalence that allows these details to be significant. For this is what leads to the phenomenon of computational irreducibility. And this in turn is in effect what allows history to be significant—and what implies that something irreducible can be achieved by the evolution of a system.

Looking at the progress of science over the course of history one might assume that it would only be a matter of time before everything would somehow be predicted by science. But the Principle of Computational Equivalence—and the phenomenon of computational irreducibility—now shows that this will never happen.

There will always be details that can be reduced further—and that will allow science to continue to show progress. But we now know that there are some fundamental boundaries to science and knowledge.

And indeed in the end the Principle of Computational Equivalence encapsulates both the ultimate power and the ultimate weakness of science. For it implies that all the wonders of our universe can in effect be captured by simple rules, yet it shows that there can be no way to know all the consequences of these rules, except in effect just to watch and see how they unfold."