I'm currently working on a project that involves dataviz & mapping, and one of the things I've discovered along the way was WikiData's SPARQL query engine.
Mind = blown.
You can basically ask Wiki(pedia) almost anything you can think of (including the largest city in a bounding box) using the same query language.
I have worked with SPARQL professionally. The biggest problem: Quality if the data. Yes it is amazing, but if stuff is missing annotations, it is hard to get meaningful results.
Lots of things in the semantic web and RDF realm suffer from that. The technologies and ideas have been largely built under the (probably implicit) assumption that the data can be made perfect, but it never is.
I quibble with the description of the shapes bounded by 10 degrees of longitude and latitude as "rectangles" - they're not even planar shapes, and their adjacent sides certainly aren't at right angles. Some of them don't even have four sides.
This bears considering when looking at the map, because some of those regions are much, much smaller than others.
Mostly an aside to your comment, but it's somewhat related. This post's divisions of 10° latitude/longitude is pretty close to the Maidenhead grid system, used heavily by amateur radio operators like myself.
The difference is that each top-level grid "square" (to your point, not actually square, but that's what they're called) is 20° longitude by 10° latitude represented by two letters. While the computation of a 4 or more character grid locator code is complex enough that most people can't quite do that in their head, because of them being treated as if they were coordinate "rectangles", it is simple enough to translate lat/lon coordinates to a grid code and vice versa with pen and paper if needed in the case of an emergency, if you know the algorithm or have a reference sheet. The 10° latitude size means that the parallels are the same in the Maidenhead system (for the first two letters of a code) as in this post. It also has the added benefit of knowing that DN is north of DM, which are both west of EM. I've gotten to where I can roughly place someone I hear on the radio based on a mental map of grid codes, and have memorized many grid codes of large population centers.
The beauty of the grid code system is that you can further refine an area by adding on subsequent numbers and letters, much like degrees/minutes/seconds in coordinates, but requiring significantly less characters to read to others over the radio. And, you can use phonetics for the letters, i.e. "delta mike seven niner" (DM79) is roughly the entire Denver metro area. Fort Collins, CO on this parent post falls under DN, above that 10°-sized parallel.
He's not making a mathematical definition of the areas. 'area' or 'box' is a simple description that everyone can understand. And the map projection used does form squares.
I'm not 100% sure about that. Are angles between circles on a sphere defined by the angles between their tangent lines? If it were true, you could for each point/meridian combination construct an infinite amount of different circles that would all meet at right angles with that meridian (just vary the circle's radius). That doesn't feel quite right.
According to Wikipedia [1], "in spherical geometry, angles are defined between great circles". Meridians are great circles, but parallels are not. Possibly the angle is simply not defined for circles that are not great circles?
Right angles with regard to the latitude-longitude coordinate system, but not with regard to the surface of the earth. Recall that all meridians intersect at the poles.
@tantalor is technically correct; meridians and parallels intersect at right angles, even with respect to the (idealized) surface of the earth. Proof sketch: if the angles involved in a crossing were not all 90°, then either the crossing would not be symmetric about the meridian (it is), or one or both lines must have a sharp "kink" (i.e., have a non-smooth first derivative) at the intersection (neither does).
However, parallels, while smooth, are not straight, they "curl" toward the nearest pole (i.e., non-zero second derivative, relative to the earth's surface). This accounts for the non-"rectangular" shape of quadrangles and can be replicated on a 2D plane.
Take the lines tangent to the meridian & parallel at the point of intersection. Observe these lines are perpendicular. Hence the meridian & parallel meet at a right angle.
The dual of a geodesic polyhedron, e.g. an icosahedron whose faces have been subdivided is in fact a common and excellent choice for geospatial applications, with https://github.com/uber/h3 being a good implementation.
You can also subdivide the faces of a cube into smaller squares with a quadtree, e.g. https://s2geometry.io/
The Disdyakis triacontahedron's faces are probably too long and skinny for most geospatial applications though.
Well it lends to an aliasing effect, that in Europe is even more comical. You completely lose the major cities of most countries like Ireland, Sweden, Denmark, and Poland. Central Europe is blotted out by Rome. The Romans finally conquer (most of) Germany. ;)
I'm not even sure how accurate the map is, though. St. Petersburg is at the eastern end of the Gulf of Finland, and it would seem that it should be in the same square as Helsinki. St. Petersburg is definitely bigger.
I guess it ends up in the same square with Moscow and thus doesn't show up but either the map doesn't show the shoreline of the sea very accurately at all or it's wrong about the location of St. Petersburg.
Helsinki has a population of 650,058 and is located at 60°10′15″N 24°56′15″E while;
Saint Petersburg has a population of 5,351,935 and is located at 59°56′15″N 30°18′31″E
Saint Petersburg is one quadrant south and one quadrant east of Helsinki, or the same quadrant as Moscow as you said.
The bay gets a little thin before Saint Petersburg, and there is a parallel passing exactly through it. I’m guessing that’s whats making Saint Petersburg appear more inline then it is, or—more accurately—makes the shoreline appear to be more to the west then it actually is.
However a striking feature of the linked article is how frequently there's a major city right near a cell edge.
A more useful list might be constructed by some process involving assigning a "city" an imprint size based on population, and a surrounding 'metro area' based on absorbing any weaker cities that overlap until the process either repeats or is matched by a neighboring city.
Yeah I was hoping there was a shape more interesting than a triangle (or a soccer ball)... Those rounded squares and rounded triangles were the best I could find.
I like that idea of building the regions bottom-up.
The South Pole is a particularly interesting case. Clearly the Pole itself is 90 South and the choice of longitude is arbitrary there. However, Amundsen-Scott station, which is a hundred odd metres away, happens to be at ~140 E (89.997553 S!) and due to the map projection it appears as if it's in a totally different place.
It would be interesting to see an interactive version with a free choice of meridians.
A fun thought experiment is to noodle through whether, given the method of choosing cities described, the choice of map projection would influence the resulting list of cities.
Latitude and longitude exist as boundaries outside the domain of map projections. A different projection may draw these lines or curves differently, but they still delineate the same geographic boundaries. E.g. a different projection may not show rectangles, but instead curved regions, but the cities existing in those regions would be the same.
If you are thinking of drawing arbitrary rectangles on arbitrary projections, then you'd have to come up with a rigorous way to define your rectangles. If you don't, then you're just drawing random shapes on other random shapes. Either way, yes, a different projection with the same rectangles drawn over it would yield different answers, and this should be immediately obvious.
I think something more interesting might be like a voronoi diagram of all cities such that the city is in like the top 5 within a 1000 miles radius or something (or just that but without the voronoi part). This should preserve the fact that only the largest cities in some area are present, but eliminate the arbitrariness of the boundaries. I think you would want to compute this with a sweep line algorithm.
A map projection that preserves area would be better, otherwise there's an excessive number of cities in the far northern hemisphere, and a lack of detail in the tropics.
Denmark has a funnier case. There are 6 cities/town within the Kingdom of Denmark with 0 of them in Denmark. (5 in Greenland and 1 in The Faeroe Islands; Iceland belonging to the Kingdom until 1944 has 2)
Mercator can be a very useful map. Angles are not distorted. It's also very good at low distortion at small scales. As you zoom in on a point, the area distortion diminishes. Once your area of interest is around 300km wide, the area distortion is only about one part in one thousand. It converges very quickly on a "true" representation of the earth.
Mercator's ubiquity isn't due to people who "aren't into maps" but rather because its strengths are really powerful in many use cases. Imagine the confusing situation if, in your favorite Maps app, North weren't always directly up, intersections met at wrong angles, and square buildings didn't always look square.
Those strengths being in navigation: straight lines in Mercator maps correspond to a constant bearing in the real world, which is really handy when your ship needs to cross thousands of kilometers of ocean.
I wonder if London could be the largest city in two boxes. The zero meridian line goes through Greenwich, which leaves most of London in the western box, but a still sizeable chunk of East London in the eastern box.
Is East London bigger than Hamburg?
Are there any other large cities that straddle the bounding boxes?
Many reasons, but probably the biggest one is social shifts causing an overall degradation of city life resulting in people leaving. When the Chicago Board of Trade ended floor trading a significant amount of large finance companies moved elsewhere, some relocated to Dallas, some reduced staffing/office space and had principle officers in NYC already.
There's a lot of conversations about cities in the Midwest simply not happening, even though critical decay is occurring. St. Louis is another example of a city where they were doing very well for awhile and had a strong aerospace/telecom/manufacturing/tech sector and the city has now decayed dramatically to the point many businesses have pulled out and crime rates have skyrocketed. St Louis and Chicago are both more dangerous cities than Detroit, and all the available evidence points to violence being primarily a socioeconomic issue rather than an issue of any other factors.
If you’re looking for somewhere to invest, anywhere in the bottom of these lists will make it much harder to earn a decent ROI due to a higher taxes to government services ratio compared to places higher on the lists. Chicago unfortunately is at the bottom of both lists.
Chicago used to have extremely high industrial employment, so the general downward trend across the United States was amplified in Chicago. The population peaked in the late 1950s and has been dropping since, although the rate of decrease has been dropping as Chicago has transitioned to a very diversified economy. The north side of the city is now very stable population-wise and the downtown area has seen incredible growth (in the last 3 censuses, Chicago has seen higher downtown population growth than any other city in the US), while the south and west sides continue to struggle. These are the areas that were the most industrial.
Houston has seen tremendous growth primarily through annexation. It is now more than 3 times the geographical size of Chicago. But annexation of residential land in Texas is now far more difficult, so the geographic size growth of Houston has dramatically slowed [1]. Once all the undeveloped land is used, Houston will have to rely on densifying for population growth. Current restrictions make it unlikely to ever be as dense as NYC, Chicago or LA, but with its massive size it wouldn't need to be to attain 5 or 6 million residents. On the other hand, it is already facing growth challenges that may reduce the carrying capacity of the city [2].
I'd guess that Houston surpasses Chicago in population within 15 or 20 years, but it remains to be seen if that is going to be permanent or not.
A lot of comments about metros areas and the like, such as how Jacksonville shows up because of city limits being larger than atlanta, but if you start getting I to trying to define what a city's metro area is, you run into a lot of issues with giant powerful cities like NY. Can you really count the population of Newark, New Jersey as part of NYC, NY? Even though newark is firmly within the NYC metro area, it's a separate city in a different state with it's own government etc. And what do you do about the bay? Is San Francisco part of san Jose? What about twin cities, like dallas and fort worth? This probably makes the most sense as a way of doing this map since at least it's clear cut.
yes that's exactly what I'm referring to as the problem - the city limits are arbitrary administrative lines. What would not be counted in atlanta as simply part of the suburb would be counted towards the population of jacksonville because the boundaries just happen to be huge, which means jacksonville ends up larger than atlanta, even though more people are in the atlanta metro area and would call themselves as people from atlanta as there are people in the jacksonville metro area
As you get near the outskirts of a city, where you live starts to depend on who you're talking to. I live in Gig Harbor, Washington, which is about 20 minutes from Tacoma, which is itself about half an hour from Seattle. If I was talking to someone else from the area, I'd never claim to live in Seattle. But if I'm talking to someone from out of state, then yeah, sure, I'm from Seattle.
One way to work around the arbitrariness of city limits would be to only count the population living within a fixed distance of the city center. I would suggest 5 or 6 km, which is approximately the distance that your can comfortably walk in 1 hour, and also (perhaps not coincidentally) the approximate radius of Paris. This would get you a list of dense cities, which is probably what people are imagining when they thing of "large" cities.
Another way would be to count everyone as part of the population of whatever city they are physically closest to, without regard to political boundaries. But this would probably just get you a list of sprawling metropolitan areas.
A good way to define metropolitan areas is by commuter patterns. If a certain percentage of residents of a county or town all commute to the same adjoining larger town, then that gets counted.
Not all cities have centers and some cities have multiple different centers that are quite far away from each other depending on what criterion you use.
The choice of geographic unit depends what questions you're interested in answering. If you're trying to answer questions about government, tax base, city services, etc., then you definitely want to use legal/administrative divisions as your unit of analysis. If you're interested in answering demographic questions, like population growth or labor pool, then statistical areas like CBSAs are more useful, because differences in administrative boundaries would introduce inconsistencies in your data.
You can also define a region as an area of (relatively) continuous density - the Census does this and labels them "urbanized areas". Any one of the 3 approaches (administrative boundaries, commuting zones, density) can be reasonable, depending on what sort of questions you'd like to answer.
> Can you really count the population of Newark, New Jersey as part of NYC, NY?
If you're looking at metro areas (which are defined by commuting regions), then you definitely should, because a sizable fraction of the residents of Newark and the surrounding communities commute into NYC for work. The Census does define a sub-unit called metropolitan division, and Newark, NJ is one of these.
> Is San Francisco part of san Jose?
If your unit of analysis is MSA, then no - "San Francisco-Oakland-Hayward, CA" and "San Jose-Sunnyvale-Santa Clara, CA". This division recognizes that the two have separate (but overlapping) commuting zones - very few people commute from Richmond to Sunnyvale, or from Milpitas to San Francisco.
However, they are both within the "San Jose-San Francisco-Oakland, CA" Combined Statistical Area, which is a broader unit which recognizes that there are commuting relationships between the two areas - they are just weaker than the commuting relationships within the MSAs.
> What about twin cities, like dallas and fort worth?
Dallas-Fort Worth-Arlington, TX is an MSA. For the purposes of maps like this, you can generally just take the name of the city that come first in the MSA name and people will understand what you're referring to.
Guangzhou is usually reported as larger than Shenzhen.
A specific issue with the population of Chinese cities, though, is that the administrative 'city' division in China can be quite larger than what might be expected in the West.
a lot these are pretty bad representations of real population, even in the US.
ex: Jacksonville is listed as the largest city in the southeast, and strictly speaking, this is true because it's a fairly dense population area. But it's a pretty misleading statement to say that it's the largest city.
Jacksonville's metro population is only about 1.5 million.
Atlanta is in the same box and Atlanta's metro population is nearly 4 times larger (5.9 million), Nashville also beats Jacksonville (1.9 million)
I came to make the same comment. I was surprised to see that "technically" Jacksonville is nearly 2x larger than Atlanta but digging deeper it just seems about city borders. Jacksonville is pretty large and not that dense. Atlanta is pretty small and fairly dense. When you look at the metropolitan areas Atlanta is nearly 4x larger than Jacksonville metro area and that really just includes the area covered by MARTA so it is all pretty much "Atlanta".
And they both go to show that statements/calculations about things like urban density are heavily influenced by historically and culturally determined political boundaries and definitions. Even look at the US Census Bureau 80% urban metric that a lot of people like to quote. Lots of small towns with farmlands and orchards (like where I live) are "urban" by this particular definition. And they are "urban" relative to truly rural Wyoming. But they're not urban in the sense of having any of the attributes of a dense city center.
There must be a way to come up with an (almost) universally applicable metric of city-ness, regardless of the adminstrative boundaries, that can be applied to situations like this map. Something like, the area in which the population density is above a certain threshold, and the population that lives in that area.
FiveThirtyEight has done some work in that area for purposes of splitting urban/rural from a political perspective. The article also mentions a couple other examples of something similar.
Although it's often even messier than that. Look at the density of Manhattan vs. NYC as a whole. Or core Paris vs. Paris as a whole. Much less the many American cities with fairly dense but small downtowns but that sprawl far into the distance.
Used to live on Baseline Rd. It originally split Nebraska Territory to the north and Kansas Territory to the south, originally surveyed in 1859. The Colorado territory wasn't formed until two years later in 1861. (Utah territory originally began somewhere in the Rockies beyond Boulder).
This is the problem with how we define cities. Jacksonville is technically the largest city in all of Florida too, but that is because it annexed almost all of its suburbs, so the whole "metro" area is just on city, which is quite different then most other American cities.
Right. Havana is more populous than Miami because it includes 15 municipalities, covering 728 km2. Miami is just Miami proper, which is 145 km2. But the Miami urban area is 2900 km2, and has 5.5M people which would be larger than Havana (2.1M people).
Miami is much denser than Havana (4,299.7/km2 vs 2,892.0/km2).
But that's besides the point I guess. Every map maker makes arbitrary choices. And it made me curious why some cities and not others.
This is also why Phoenix appears as one of the US's largest cities, while its metro area is significantly smaller than other cities that have a smaller population.
What would be awesome is if you can take a grid like this and then drag it to create an offset and change in subtle ways what cities appear in the boxes.
This could be the beginning of something like a gerrymandering tool.
> There is a Chinese version of this. The Chinese cities on the map include the big Chinese cities: Shanghai, Beijing, Chongqing, Shenzhen, Chengdu. Shenyang. A couple of cities in Mongolia, analogous to the appearance of Winnipeg on the U.S. map.
Baotou and Hohhot are both Inner Mongolian cities, i.e. Chinese cities.
It's like saying Pluto is a big planet. It may be true but there is a bigger one nearby that make it invisible if your map is about the biggest in the neighborhood.
I suspect the point is that decisions are made whenever a map is constructed, and those decisions can produce wildly different views of the world.
While some of those decisions are in some sense objective (e.g. the equator being 0 degrees latitude), others are purely arbitrary or historical (e.g. the definition of 0 degrees longitude). If that meridian had a different definition, the results would be quite different. Then there are decisions made by a particular map maker. As others have noted, small town and research stations at far northern and southern latitudes have disproportionate representation simply because the parameters (10 degrees of arc in latitude and longitude) divide a smaller area of the globe into the same number of bins as the much larger area of the globe at equatorial latitudes.
This map is far more interesting in terms of what it tells us about intrinsic biases in mapping than in terms of what it tells us about the world.
This could serve as a good example of how politicians manipulate voting areas to ensure majorities. The results are probably correct, but the result is misleading.
Gerrymandering was the very first thing that popped into my head when I noticed the most populace city in my country lose out to what's basically a town because it ended up on a grid with another country's city. I nearly started a "that's clearly a mistake" comment before realizing what had happened.
I had a friend who worked up in Alert while he was in the air force. It may be one of the most remote places on the map, it's essentially an out grown air force base/research station, there would be nothing there if not for the cold war.
Sounds plausible; I was just looking up a few of the eastern Siberian towns - Ust-Nera (143E, 64N or so) at 6,500 inhabitants, Bilibino (167E, 68N) - also home to the world's northernmost nuclear power plant, 6,000 inhabitants...)
Closer to (my) home - Vardø (2,000) and Hammerfest (8,000) at the extreme north of Norway. Positively crowded!
What I found particularly interesting is that Fort Collins (in Colorado) was the largest city in it's block with a population of only 167k people. Just goes to show how sparsely populated that region is around Wyoming/Northern CO/Montana.
And a good example of why the 10 degree grid is arbitrary and meaningless - Denver is an hour south with a metro population in the millions. But it didn't make the arbitrary cutoff at 40N.
Pretty interesting, but it suffers from a common flaw for this type of analysis - it counts the population that lives within the municipal boundaries of the city itself, not the population that lives within the "urban area" or commuting zone of that city. Different regions have different norms and laws for how they define municipal boundaries, so in some places the main municipality (e.g. Boston) is just a fraction of the urban core, while in other places the municipal boundaries include huge amounts of suburban and even rural land. Unless you're specifically interested in analyzing something related to municipal government, this isn't normally what you want to use to compare across regions.
Jacksonville, FL is an extreme example of this. Jacksonville's municipal boundaries are huge - it covers 875 sq. miles[1]. This is ~1.85x the area of New York City[2], and 3.77x the area of San Francisco[3].
Within the US, it's preferable to use core-based statistical areas (CBSAs)[4] for this type of analysis. They're defined by commuting zones, and they're not sensitive to political boundaries except to the extent that political boundaries actually affect commuting patterns. If you want to limit yourself to the "dense" part of the metro area then you can use the Census-defined Urbanized Areas[5], but the definitions of those might not match your intuitions about what should be included.
Assuming we use CBSA population, then the box that contains Jacksonville should instead have Atlanta, which has an CBSA population of ~5.8 million vs. only ~1.5 million in Jacksonville[5].
The statistical agencies for other countries have similar concepts. For Europe, Eurostat defines metropolitan regions[6], which have a similar definition. The OECD also maintains a data set of population by metropolitan area for OECD countries, although the methodology isn't consistent across all countries in that data set[7].
TL;DR - political and municipal boundaries aren't comparable across regions, use commuting zones or density-based region definitions instead. If you see Jacksonville, FL in a list of "biggest cities", it's always a red flag that they're using municipal boundaries.
Yes, definitely. This sort of issue is very hard to resolve adequately. There's no definition that always does everything we want.
A while back I wrote an article comparing each U.S. state's largest and second-largest cities. (https://blog.plover.com/misc/second-largest-cities.html) I made a perfectly reasonable choice of which definition to adopt, but it still had major peculiarities: the largest city in New Jersey was Trenton, not Newark, because in the data set I used, Newark was considered part of New York City.
Unless I'm missing something, this map might be pretty or impressive that it can be made, but it is entirely meaningless. Meaningless in that I don't know of significant meaning of breaking down the globe into 10° chunks whatsoever. This doesn't tell you anything about the cities shown, nor the area in which those cities exist. There is zero meaning in this map because no meaning can be derived from the filter process. The constraints have no relation to the data.
The article is cool that it takes a dive into the "odder" boxes and finds a flaw or two. The author also recognizes the arbitrary nature of the map. I imagine that almost any other filtering would be more interesting to geek out over though.
I am somewhat surprised by the popularity of this post here and on reddit, but then again that is sometimes how popularity works. The piece is presented well, just meaningless.
That's a fair point I respect. It does seem that it would be easy to have this fun in a way which is also useful, with literally any filtering system which had a relationship to the data. Equal work, equal fun, more than zero use.
its an interesting problem. for the longest time, openstreetmap hid Philadelphia pretty much until you got down to the rooftop level and it annoyed the crap out of me. Its still not very good at what cities to show; Its a hard problem and I appreciate the article shining light on this issue.
Where is Saint Petersburg, Russia? It seems to be on the border of the boxes and might be in the same box with Helsinki (metro 1,495,271) or Minsk (metro 2,645,500). Saint Petersburg's population is 5,351,935.
> What the heck happened to St. Petersburg? (at 59.938N, 30.309E, it is just barely inside the same box as Moscow. The map is quite distorted in this region.)
Mind = blown.
You can basically ask Wiki(pedia) almost anything you can think of (including the largest city in a bounding box) using the same query language.
Examples:
- Largest cities per country (https://w.wiki/UC4)
- Cities connected by the European route E40 (https://w.wiki/74E)
- Streets in France named after a woman (https://w.wiki/34K)