This puzzle challenge brings joy to the world of code
By midnight on December 1, 2015, when Eric Wastl first launched his annual Santa-themed puzzle-a-day programming challenge Advent of Code, 81 people had signed up. That pretty much matched his capacity planning for 70 participants. Wastl figured this amusement might be of interest to a few friends, friends of friends, and maybe some of their friends as well.
But Wastl, a software engineer who works as a senior architect for TCGPlayer, an online marketplace for trading card games, had failed to anticipate how social media’s recursive contagion might overwhelm these modest expectations. He jokes that the technical term for what happened next is: “OH NO!” Within 12 hours there were about 4,000 participants. The server nearly crashed. At 48 hours, there were 15,000 people, and by the end of the event, on December 25, the grand total was 52,000. The following year, he moved the operation to Amazon Web Services, and numbers have since continued to grow.
Last year, perhaps due to the pandemic, the event saw a 50% spike in traffic, with more than 180,000 participants worldwide.
And now again this year, thousands of coders from San Francisco to Slovenia—students and software engineers and competitive programmers alike—are counting down to Christmas with Advent of Code (AoC). While traditional advent calendars deliver daily gifts of chocolate or toys (and some alternative versions deliver dog treats, Jack Daniel’s, Lego figures, or even digital delights via apps), Advent of Coders unwrap playfully mathy problems and then write computer mini-programs that do the solving.
The fun of it, partly, is simply in the time-honored magic of a holiday ritual. But it’s also in submitting to pleasurable puzzlement. Peter Norvig, a research director at Google, finds it fun because he trusts the creator, Wastl, “to make it worth my time”—in a similar way, Norvig says, to how New York Times crossword puzzlers trust Will Shortz to do right by them. “There will be some tricks that make it interesting,” says Norvig, “but there are bounds on how tricky.”
The joy of coding
At midnight US Eastern time (Wastl is based in Buffalo, New York), every night from December 1 to 25, a new puzzle lights up at adventofcode.com, embedded within a cleverly composed Christmas-caper narrative—one player described the story as “an Excuse Plot if there ever was such a thing.”
This year’s event got off to a fine start when Santa’s elves lost the keys to the sleigh. The first problem set the scene as follows: “You’re minding your own business on a ship at sea when the overboard alarm goes off! You rush to see if you can help. Apparently one of the Elves tripped and accidentally sent the sleigh keys flying into the ocean!”
Luckily, the Elves had a submarine handy for just such emergencies, and from there participants set off on a 25-day underwater quest. They try to solve two puzzles daily (the second adding a twist, or more difficulty), each worth a star and some praise: “That’s the right answer! You are one gold star closer to finding the sleigh keys.”
Every player earns a star for solving a problem, but if you’re the first to get a star, you receive 100 points; if you’re second, you receive 99 points; and so on, with the 10oth place earning one point.
“In order to save Christmas,” the puzzle master explains, “you’ll need to get all fifty stars by December 25th.”
The object of Advent of Code is to solve the puzzles using your programming language of choice (Python is the most popular). Participants also use by-hook-or-by-crook strategies—such as “Excel madness,” as Wastl describes it, or reams of graph paper, and a surprising number solve the puzzles in Minecraft.
But the broader motivation varies from player to player. Some treat it as an annual tune-up for their programming skills; others see it as the perfect opportunity to learn to code or try a new language. José Valim, creator of the Elixir programming language, is live-streaming his AoC solutions on Twitch.
At the top of the global leaderboard, which ranks the 100 players with the highest total score, competitive programmers like Brian Chen (his handle is “betaveros”) and Andrew He (“ecnerwala”) are out for speed. A security software engineer working on end-to-end encryption at Zoom, Chen placed first last year (and the year before), while He came a close second.
“Going fast is fun,” Chen says, “just like optimizing anything where you can get fairly immediate feedback. There are lots of little knobs to tweak, and lots of little moments to be proud of where you made the right choice or prepared something that came in useful.”
Both MIT computer science alums who live in the Bay Area, Chen and He are friendly rivals who’ve competed together in programming challenges over the years—on the same team at the International Collegiate Programming Contest (ICPC) and as competitors at Codeforces and Google’s Code Jam. This year again, Chen is beating He. “To be honest, it’s ’cause he’s a little better than me”—better at various tricks and implementations that optimize speed—“but I don’t like admitting that,” says He, a founding engineer at the startup Modal, which builds infrastructure and tooling for data teams.
The leaderboard is out of reach for the majority of participants—especially as puzzles get harder by the day. Kathryn Tang, who runs an engineering operations team at Shopify, ranked 36th on day one and 81st on day three, but she knew her leaderboard status wouldn’t last long. “I’m doing this for fun using Google sheets,” she says.
The element of contest, however, is replicated—at Shopify and Google and many companies big and small—with private leaderboards, as well as dedicated chat channels where players share solutions and kvetch about the problems in post-mortems.
“The competitiveness helps commitment,” said the engineer Alec Brickner, commenting in a Slack channel at Primer.ai, a natural-language-processing startup in San Francisco (Brickner has made the leaderboard on a couple of days so far).
“Meh,” replied his colleague Michael Leikam. “The payoff for me is the joy of coding.”
John Bohannon, Primer’s director of science, seconded that with an emoji: “SAME.”
Bohannon also loves the silly story that sets up the problems, but the plot has little to zero utility. “The speed-demon solvers completely ignore the story, focusing on the variables of the problem to solve and just getting to it,” he says.
Nora Petrova, a data scientist and engineer at Primer’s office in London, UK, is there for the beauty, not the sport: “I love the drama that’s unfolding in every puzzle,” she says. For instance, on day four, a giant squid attached itself to the submarine—it wanted to play bingo, of course. The puzzle input was a random set of 100 bingo boards, and the challenge was to predict the winning board and give it to the squid.
Love it XOR Hate it
Wastl’s main motivation in creating Advent of Code was to help people become better programmers. “Beginners who are just getting into programming are the people I want to get the most out of this,” he says. “The success metric for most people should be ‘How many new things did I learn?’—not ‘Was I one of the very, very fastest people in the world to solve this puzzle?’”
Russell Helmstedter, a middle school teacher at the De Anza Academy of Technology and the Arts, in Ventura, California, is using Advent of Code to teach Python to his students in sixth, seventh, and eighth grades. They tackled the first two problems together as a class. From a teaching perspective, the problems are effective exercises because if you fail, you can simply try again—very much in the spirit of test-driven software development.
Helmstedter found that some of his students were a bit overwhelmed with the two-pronged challenge—deciphering the problem and coding a machine to solve it—but most embraced the struggle. “I like that it is hard to do,” one student said on a survey. And another said, “There is honestly no downside. I really like how you start working progressively toward a goal.” Although the survey’s multiple-choice question ranking “feels” elicited one “Hate it,” 41 respondents chose “Like it” (to varying degrees) and eight “Love it.”
At the University of Ljubljana, in Slovenia, the computer scientist Janez Demšar uses the AoC problems both as a professor and to hone his own skills (he’s on the core team of Orange, an open-source machine learning and data visualization toolbox). “I need to have some regular practice, like a violinist who plays in an orchestra and does some teaching but still needs some small pieces to practice,” he says. “So these are my etudes.” Demšar teaches Programming 101 to a heterogenous group of more than 200 students. “My greatest concern,” he says, “is how to keep those who already know some (or a lot) of programming interested and occupied. AoC tasks are great because they require various skills”—from pure coding to algorithms.
Gregor Kikelj, a third-year mathematics undergraduate at the university, first tried Advent of Code in 2019. He did well enough to land himself an internship at Comma.ai (working on Openpilot, its software for semi-automated driving systems), since the founder of the company was also competing. And Kikelj boosted his grade in the programming course (with another professor), since every problem solved was worth extra points on the final exam—plus bonus points for placing on the leaderboard.
Kikelj (“grekiki”) got up every morning for the puzzle drop—6 a.m. in Slovenia—and ranked 52 overall on the leaderboard, accumulating a total of 23 extra exam points. “After that year, they put the cap on the amount of points you can receive to 5,” he recalls. But he’s still rising with the sun to pounce on the puzzle. This year his best ranking, on day five, was 25th—he’s aiming to stay in the top 100. “We’ll see how it goes as the problems get harder,” Kikelj says.
How to leaderboard
If the leaderboard is your game, competition is fierce and the daily countdown is key—players wait like a hawk for the puzzle to drop, and then click lickety-split to download. Last year, this “giant burst of traffic synchronized to a single second” (as Wastl describes it) troubled even Amazon’s load balancers.
The AoC Subreddit—one of many communities around the internet—is full of inside-baseball banter about how to prevail (with solutions and help threads, as well as self-satire and memes). But the best resource is perhaps Brian Chen’s blog post on “how to leaderboard.”
The Download: how we can limit global warming, and GPT-4’s early adopters
Time is running short to limit global warming to 1.5°C (2.7 °F) above preindustrial levels, but there are feasible and effective solutions on the table, according to a new UN climate report.
Despite decades of warnings from scientists, global greenhouse-gas emissions are still climbing, hitting a record high in 2022. If humanity wants to limit the worst effects of climate change, annual greenhouse-gas emissions will need to be cut by nearly half between now and 2030, according to the report.
That will be complicated and expensive. But it is nonetheless doable, and the UN listed a number of specific ways we can achieve it. Read the full story.
How people are using GPT-4
Last week was intense for AI news, with a flood of major product releases from a number of leading companies. But one announcement outshined them all: OpenAI’s new multimodal large language model, GPT-4. William Douglas Heaven, our senior AI editor, got an exclusive preview. Read about his initial impressions.
Unlike OpenAI’s viral hit ChatGPT, which is freely accessible to the general public, GPT-4 is currently accessible only to developers. It’s still early days for the tech, and it’ll take a while for it to feed through into new products and services. Still, people are already testing its capabilities out in the open. Read about some of the most fun and interesting ways they’re doing that, from hustling up money to writing code to reducing doctors’ workloads.
Google just launched Bard, its answer to ChatGPT—and it wants you to make it better
Google has a lot riding on this launch. Microsoft partnered with OpenAI to make an aggressive play for Google’s top spot in search. Meanwhile, Google blundered straight out of the gate when it first tried to respond. In a teaser clip for Bard that the company put out in February, the chatbot was shown making a factual error. Google’s value fell by $100 billion overnight.
Google won’t share many details about how Bard works: large language models, the technology behind this wave of chatbots, have become valuable IP. But it will say that Bard is built on top of a new version of LaMDA, Google’s flagship large language model. Google says it will update Bard as the underlying tech improves. Like ChatGPT and GPT-4, Bard is fine-tuned using reinforcement learning from human feedback, a technique that trains a large language model to give more useful and less toxic responses.
Google has been working on Bard for a few months behind closed doors but says that it’s still an experiment. The company is now making the chatbot available for free to people in the US and the UK who sign up to a waitlist. These early users will help test and improve the technology. “We’ll get user feedback, and we will ramp it up over time based on that feedback,” says Google’s vice president of research, Zoubin Ghahramani. “We are mindful of all the things that can go wrong with large language models.”
But Margaret Mitchell, chief ethics scientist at AI startup Hugging Face and former co-lead of Google’s AI ethics team, is skeptical of this framing. Google has been working on LaMDA for years, she says, and she thinks pitching Bard as an experiment “is a PR trick that larger companies use to reach millions of customers while also removing themselves from accountability if anything goes wrong.”
Google wants users to think of Bard as a sidekick to Google Search, not a replacement. A button that sits below Bard’s chat widget says “Google It.” The idea is to nudge users to head to Google Search to check Bard’s answers or find out more. “It’s one of the things that help us offset limitations of the technology,” says Krawczyk.
“We really want to encourage people to actually explore other places, sort of confirm things if they’re not sure,” says Ghahramani.
This acknowledgement of Bard’s flaws has shaped the chatbot’s design in other ways, too. Users can interact with Bard only a handful of times in any given session. This is because the longer large language models engage in a single conversation, the more likely they are to go off the rails. Many of the weirder responses from Bing Chat that people have shared online emerged at the end of drawn-out exchanges, for example.
Google won’t confirm what the conversation limit will be for launch, but it will be set quite low for the initial release and adjusted depending on user feedback.
Google is also playing it safe in terms of content. Users will not be able to ask for sexually explicit, illegal, or harmful material (as judged by Google) or personal information. In my demo, Bard would not give me tips on how to make a Molotov cocktail. That’s standard for this generation of chatbot. But it would also not provide any medical information, such as how to spot signs of cancer. “Bard is not a doctor. It’s not going to give medical advice,” says Krawczyk.
Perhaps the biggest difference between Bard and ChatGPT is that Bard produces three versions of every response, which Google calls “drafts.” Users can click between them and pick the response they prefer, or mix and match between them. The aim is to remind people that Bard cannot generate perfect answers. “There’s the sense of authoritativeness when you only see one example,” says Krawczyk. “And we know there are limitations around factuality.”
How AI experts are using GPT-4
Hoffman got access to the system last summer and has since been writing up his thoughts on the different ways the AI model could be used in education, the arts, the justice system, journalism, and more. In the book, which includes copy-pasted extracts from his interactions with the system, he outlines his vision for the future of AI, uses GPT-4 as a writing assistant to get new ideas, and analyzes its answers.
A quick final word … GPT-4 is the cool new shiny toy of the moment for the AI community. There’s no denying it is a powerful assistive technology that can help us come up with ideas, condense text, explain concepts, and automate mundane tasks. That’s a welcome development, especially for white-collar knowledge workers.
However, it’s notable that OpenAI itself urges caution around use of the model and warns that it poses several safety risks, including infringing on privacy, fooling people into thinking it’s human, and generating harmful content. It also has the potential to be used for other risky behaviors we haven’t encountered yet. So by all means, get excited, but let’s not be blinded by the hype. At the moment, there is nothing stopping people from using these powerful new models to do harmful things, and nothing to hold them accountable if they do.
Chinese tech giant Baidu just released its answer to ChatGPT
So. Many. Chatbots. The latest player to enter the AI chatbot game is Chinese tech giant Baidu. Late last week, Baidu unveiled a new large language model called Ernie Bot, which can solve math questions, write marketing copy, answer questions about Chinese literature, and generate multimedia responses.
A Chinese alternative: Ernie Bot (the name stands for “Enhanced Representation from kNowledge IntEgration;” its Chinese name is 文心一言, or Wenxin Yiyan) performs particularly well on tasks specific to Chinese culture, like explaining a historical fact or writing a traditional poem. Read more from my colleague Zeyi Yang.
Even Deeper Learning
Language models may be able to “self-correct” biases—if you ask them to
Large language models are infamous for spewing toxic biases, thanks to the reams of awful human-produced content they get trained on. But if the models are large enough, they may be able to self-correct for some of these biases. Remarkably, all we might have to do is ask.