Programming is hard by Stephan Schmidt

50k lines considered very large?

Ola Bini considers 50K of lines as very large:

“I know several people who are responsible for quite large code bases written in Ruby and Python (very large code bases is 50K-100K lines of code in these languages).”

This explains a lot.

And the blog post made me think. We’ve written >50K code bases in Python in the 90s in a small development shop (<10 developers). I don't consider this “very large”. Large or very large starts for me at sizes when one developer cannot possibly know all the code (independently of the language) or cannot have a good overview.

As I see now that Ola Binis blog scrambled my comment, I repeat it here for reference

“The Maintenance myth”

[snip snip snip]

“Has there been any research done in this area?”

Nice blog post, if you cut out the middle. Interesting calling something a myth and then asking about research in the end.

“(very large code bases is 50K-100K lines of code in these languages).”

100K is very large? I wrote some projects in a two person team and reached 50K of lines. This is rather small. We did 50K Python programs in the 90s in a small development shop (<10 developers). Very large starts (for me) at 1M LOC.

(I don't like LOC as a metric though, FP or "Thought points" are much better because they are more comparable between languages and make more sense: A developer has to think about every "thought point" => more thought points = more complexity & more effort).

@Seo: “Codes written in dynamic languages tend to be shorter than codes written in static languages doing the same thing, and I think code size is the most important factor in maintenance.”

I don’t think Scala is much larger in LOC than Ruby.

And though Lisp & Haskell may have less LOC, they have a lot of Thought points because they have a high density of thought points whereas Java has a very low density with lots of noise in between.

Thanks for listening.

Update: Concerning my comment to Seo

An Ruby example

class Song
  def initialize(name, artist, duration)
    @name     = name
    @artist   = artist
    @duration = duration
  end

  def how_long
      "{@duration} minutes"
  end
end

or with idiomatic Scala

class Song(val name:String, val artist:String, val duration:Int) {
    def howLong = duration + " minutes"
}

or more similar to Ruby:

class Song(aName:String, aArtist:String, aDuration:Int) {
    val name = aName
    val artist = aArtist
    val duration = aDuration

    def howLong = duration + " minutes"
}

Another Update: Marcos suggested

class Song
  def initialize(name, artist, duration)
      @name, @artist, @duration = name, artist, duration
  end
  ...

as more idiomatic Ruby. Thanks.

If you liked this post, subscribe to my free full RSS feed.
Filed under: Java, Python, Ruby

You can share this post!
Do you want to tell others about this article? Use the social bookmark icons to submit this artice to the service of your choice. Thanks.

Get free updates by email

If you did like this article you can get free updates with your RSS reader, you can follow me on Twitter or get free update to new posts by email. Enter your email:

 
About the author: Stephan has been working as a head of development and CTO. He has experiences in different technologies since 20 years including Java, Rails and Python. Stephans main field of interest is maintainablity and productivity in software development. Want to know more? All views are only his own.

Comments

seiju

<1k very small project
1k-10k small project
10k-100k medium project
>100k large project
xk very large project
But this measure will change according to what language we are using, of course.

stephan

@seiju: Yes, as I’ve said, thought points are much better than LOC. From my feeling Python is not more than 5x smaller than Java though [*].

So 50k of Python code are at most corresponding to 250k of Java code, which is not a very large project.

[*] comparing e.g. list comprehensions in Python or closures in Ruby with Google Collections examples. But I would be very interested in a comparison of a LOC factor of Python, Ruby, Scala and Java.

hi stephan,

50K lines in python in the 90s — I’d assume that you didn’t use a lot of libraries ? These days if one writes java code, given the frameworks and libraries, there is little java code that you yourself have to write — atleast, that is the case in 80% of the projects.

So, I’d say that 50K lines of your python code would have a good amount of infrastructure code. So, that would mean NOT MORE than 150K lines of java code today. Again depends on the shop and what frameworks and libraries and maturity of teams/developers used.

I worked on a banking application with about a 250K lines of JAVA CODE with about 70-100 libraries. Although 20% of the application code could have been reduced by refactoring and reworking, I would still call that a LARGE project. Whether it is VERY LARGE, I am not sure BUT I’m almost sure none of us could grasp the entire codebase at any point of time.

I’m interested, as you, to find out other’s opinions.

BR,
~A

stephan

@anjan: “50K lines in python in the 90s — I’d assume that you didn’t use a lot of libraries ?”

There were much less libraries than today.

“Thought points” … I like that.

Do you have a reference how to compute these? ^^

stephan

@Adrian: No, not sure, just something in my head since I’ve worked a lot with code metrics. But hadn’t have time to write a paper.

if (A && B) { ... }

would have 3 TP (use if (1), get A (1) and B (1) right)

persons.filter ( _.age > 10)

would perhaps have 2 TP (use filter and get expression right)

while

for (Person person: persons) {
    if (person.getAge() > 10) {
       filtered.add( person );
    }
}

don’t know, perhaps 1 for the loop, 2 for the if and expression, 1 for the adding, makes 4 TP.

A little bit like McCabe

@Stephan : python 10 years ago : Yes, given that there were far fewer libraries in both python and java, I assumed you wrote a good amount of infrastructure code. Today, if you were to rewrite the same project, I’d think that it would take you atleast 20-30% number of lines of code ?

Thank you,

I think Ola would be more likely to split out his infrastructure code into a separate library than most would be, so each individual codebase might be small but there might be more of them.

If you split projects up according to cliques you’ll probably find it difficult to pass 50kloc.

And more idiomatic Ruby:

class Song
def initialize(name, artist, duration)
@name, @artist, @duration = name, artist, duration
end

def how_long
“{@duration} minutes”
end
end

Kind Regards

stephan

@Marcos: Thanks for the input.

Leave a Reply