> In Linguistics the notion that a language is more difficult/complex than another is not accepted.
Hmm. In computer science, can we say that one language is more complex than another based on the size of the BNF needed to describe the language?
Even in linguistics, can't you say that one language has 8 cases, and another only has 4? Wouldn't that make the one with 8 more complex, in at least some sense?
Linguist here. In one sense yes, but there are many other senses, also maybe in one language the cases are regular but in the other they aren't. There is complexity from phonetics/phonology to pragmatics, and all languages distribute it differently. Basque cases for example are very regular, like Turkish or Hungarian, but Russian or Icelandic cases are less regular. So it's actually easier to pick up Basque at least the morphology than it is to pick up Russian or Icelandic.
In terms of ease of learning, extralinguistic factors are generally more indicative. For example, availability of intensive courses, quality of those courses, availability of materials/cultural output, willingness of speakers to interact with learners (either for reasons of inability of communication otherwise or for reasons of ideology), similarity of culture or cultural concepts between the learner's culture and the one of the language they are learning. All of these have as much or more impact on ease of learning than any index of grammatical complexity. (Edit: Basque has excellent courses run through AEK and Basque speakers are usually very motivated to talk Basque, so if you want to learn a language I would highly recommend it, oh and there is a mass of culture like music and films).
No, anything describable in BNF is at most as high on the Chomsky-Schutzenberg hierarchy as context-free grammars, regardless of size. BNF is also not a normal form, so you can't really compare rules written in BNF to one another that way. There is Chomsky Normal Form, which you could use to compare sizes. But natural languages have been shown to be non-context-free, so they aren't describable in CNF.
You might be thinking of the notion of MDL (minimum description length), which is an information-theoretic notion of complexity which boils down to the question "how long is the program that generates this data". To use minimum description length, however, you need to have a fixed programming language. So if you have a Python program that generates all of English and all of Japanese, then you could compare the complexity of those two languages.
However, this measure of complexity can't be stated absent the assumption that these languages are generated using Python -- that is, the fact that language A is more complex than language B in Python does not entail that language A is more complex than language B in the mind (using the "mind's programming language").
And herein lies the big question for a lot of linguists: what's the programming language that the mind uses to generate language?
Computer "languages" are a metaphor, they aren't languages in a linguistic sense, they aren't even writing systems per se. They are more closely related to things like musical notation or circuit diagrams. You can't communicate arbitrary information with them without at least partially embedding another language.
Speaking as an historian here and not as a linguist:
In my experience, a language that has only four cases usually has a method for expressing all eight, they just won't be formalized in the same way. For example, English has no formalized second person plural, so you might use "y'all" or "everyone" as a kludge.
> In my experience, a language that has only four cases usually has a method for expressing all eight, they just won't be formalized in the same way.
You're getting close to something interesting. Cases are an example of expressing a linguistic distinction through the use of a particular kind of syntax.
It is also common for the same distinctions to be expressed by other means that people generally agree are also syntactic. An English finite passive verb has to be expressed periphrastically, using a dedicated auxiliary verb that exists for that purpose (be or get); the verb will be inflected into a form determined by the governing auxiliary verb, but no passive form exists that can stand as a finite verb.
But this is still essentially a use of syntax to deal with what are generally felt to be "grammatical" categories. What is discussed much less often is that languages may scrupulously observe a standard distinction between grammatical categories without having any syntactic apparatus to accommodate that distinction. The only term I know related to this is "lexical aspect", but the phenomenon exists beyond just aspect. ( https://en.wikipedia.org/wiki/Lexical_aspect )
Two examples between Mandarin Chinese and English:
1. In Mandarin, verbs may be given a "resultative complement" expressing a result of the action. (This has many syntactic consequences.) A complement can be almost anything indicating a state, but there are some standard ones, and the complement that regularly indicates "success of the action" is 到. Thus, by a regular and productive construction of Mandarin grammar, we can observe the following verb pairs:
看 look 看到 see
找 search 找到 find
听 listen 听到 hear
Except of course these aren't pairs at all, if you're a Mandarin speaker; it'd be more natural to call them different verb forms. Seeing is just a particular way of looking, the successful way. By contrast, though English doesn't express this distinction in its grammar... you'd have a hard time claiming that English doesn't observe the distinction. The English verbs in that table are absolutely not interchangeable with their "partners". We have here a grammatical distinction that exists entirely within English's vocabulary.
2. If you study ancient European languages, you'll hear about the locative case, already long dead by the time of the ancient language that you're actually studying. It was used when a noun was conceived of as being a location rather than an object. (And in classical Latin, it survives in fossilized, unproductive form in a few common locational words like "home" and "the ground".)
English does not have a locative case, and also doesn't really observe any distinction between locations and other types of nouns. There are some restrictions, but they are easy to explain as being required by the semantics of locations rather than the grammar of English per se.
Mandarin also does not have a locative case, or any noun cases at all. But it nevertheless maintains a robust distinction between ordinary noun phrases and locational phrases. It is not possible to (grammatically) express in Mandarin that something is "on the table"; you have to say that it is "on the top of the table".
It is surprisingly difficult to talk about this distinction without imagining that locative case is involved. Certainly the background phenomenon is the same. Certainly this is deeply embedded within the syntax of Mandarin. But you would need to posit a "phrasal case"[1] which is entirely unexpressed in order to actually call it a "case".
[1] CGEL attempts to preserve the idea that the English clitic 's is a case marker by stating that it applies genitive case to an entire noun phrase. (This is a necessary analysis, since the clitic attaches to the last word of a noun phrase whether that word is a noun or not, but its meaning applies to the head of the noun phrase, which is not especially likely to also be the last word.) I don't like this; I think a better analysis would be to just give up on the idea of genitive case in English. But if you like the idea of phrasal case, locational phrases in Mandarin are certainly an area where you could look for some support.
more cases doesn't necessarily make a language more complex, cases are just one way to communicate a piece of information about a noun. it isn't necessarily more complex to have a separate locative case than to say "in/on/at X"!
Hmm. In computer science, can we say that one language is more complex than another based on the size of the BNF needed to describe the language?
Even in linguistics, can't you say that one language has 8 cases, and another only has 4? Wouldn't that make the one with 8 more complex, in at least some sense?