#第八章-處理舊版本代碼
*******************
在本章我們將討論一下話題:
- 閱讀Django代碼
- 探索相關文檔
- 增量變更還是完全重寫
- 改變代碼之前編寫測試
- 舊版本數據庫的集成
It sounds exciting when you are asked to join a project. Powerful new tools and cutting-edge technologies might await you. However, quite often, you are asked to work with an existing, possibly ancient, codebase.
加入項目時你可能很感興趣。
To be fair, Django has not been around for that long. However, projects written for older versions of Django are sufficiently different to cause concern. ?o?eti?es, having the entire source code and documentation might not be enough.
公平起見,Django并沒有一直這樣做。不過,項目
?Yeah,? said Brad, ?Where is Hart?? Mada? ? hesitated and replied, "Well, he resigned. Being the head of IT security, he took moral responsibility of the perimeter breach." Steve, evidently shocked,
was shaking his head. "I am sorry," she continued, "But I have been assigned to head SuperBook and ensure that we have no roadblocks to meet the new deadline."
There was a collective groan. Undeterred, Madam O took one of the sheets and began, "It says here that the Remote Archive module is the most high-priority item in the incomplete status. I believe Evan is working on this."
If you are asked to recreate the environment, then you might need to fumble with the ?? configuration, database settings, and running services locally or on the network. There are so many pieces to this puzzle that you might wonder how and where to start.
Understanding the Django version used in the code is a key piece of information. As Django evolved, everything from the default project structure to the recommended best practices have changed. Therefore, identifying which version of Django was used is a vital piece in understanding it.
理解Django的版本再代碼中的使用是很關鍵的信息。隨著Django的進化,所有來自默認項目結構的建議最佳實踐都發生了變化。因此,認出Django在使用的是哪一個版本是理解這個框架中重要的一環。
>##### Change of Guards
Sitting patiently on the ridiculously short beanbags in the training room, the SuperBook team waited for Hart. He had convened an emergency go-live meeting. Nobody understood the "emergency" part since go live was at least 3 months away.
>Madam O rushed in holding a large designer coffee mug in one hand and a bunch of printouts of what looked like project timelines in the other. Without looking up she said, "We are late so I will get straight to the point. In the light of last week's attacks, the board has decided to summarily expedite the SuperBook project and has set the deadline to end of next ?onth. ?ny questions??
>Yeah,? said Brad, ?Where is Hart?? Mada? ? hesitated and replied, "Well, he resigned. Being the head of IT security, he took moral responsibility of the perimeter breach." Steve, evidently shocked,
was shaking his head. "I am sorry," she continued, "But I have been assigned to head SuperBook and ensure that we have no roadblocks to meet the new deadline."
>There was a collective groan. Undeterred, Madam O took one of the sheets and began, "It says here that the Remote Archive module is the most high-priority item in the incomplete status. I believe Evan is working on this."
>"That's correct," said Evan from the far end of the room. "Nearly there," he smiled at others, as they shifted focus to him. Madam O peered above the rim of her glasses and smiled almost too politely. "Considering that we already have an extremely well-tested and working Archiver in our Sentinel code base, I would recommend that you leverage that instead of creating another redundant system."
>"But," Steve interrupted, "it is hardly redundant. We can improve over a legacy archiver, can't we?? ?If it isn't broken, then don't fix it?, replied Madam O tersely. He said, "He is working on it," said Brad almost shouting, ?What about all that work he has already finished??
>??van, how ?uch of the work have you co?pleted so far?? asked ?, rather impatiently. "About 12 percent," he replied looking defensive. ?veryone looked at hi? incredulously. ?What? That was the hardest ?? percent" he added.
>O continued the rest of the meeting in the same pattern. Everybody's work was reprioriti?ed and shoe?horned to fit the new deadline. ?s she picked up her papers, readying to leave she paused and removed her glasses.
>"I know what all of you are thinking... literally. But you need to know that we had no choice about the deadline. All I can tell you now is that the world is counting on you to meet that date, somehow or other." Putting her glasses back on, she left the room.
>"I a?m definitely going to bring ?my tinfoil hat,?" said E?van loudly to himself.
## 查找Django的版本信息
理論上,每個項目在根目錄都擁有一個requirements.txt文件,或者 一個setup.py文件,這個文件將表明Django再項目中要使用的是哪一個版本。讓我們來看看于此相關的文件片段:
```python
Django==1.5.9
```
注意,版本數字已經完全提過了的稱作鏈(相對于Django>=1.5.9)。對每一個包都進行鏈接是一個好習慣,因為它能夠減少人們的疑問,讓你更為確定版本信息。
Unfortunately, there are real-world codebases where the requirements.txt file was not updated or even completely missing. In such cases, you will need to probe for various tell?tale signs to find out the exact version.
不幸的是,
## 激活虛擬環境
In most cases, a Django project would be deployed within a virtual environment. Once you locate the virtual environment for the project, you can activate it by jumping to that directory and running the activated script for your OS. For Linux, the command is as follows:
再很多情況下,Django的項目會部署在一個虛擬環境中。一旦你找出了項目的虛擬環境,你可以通過跳過這個目錄,并為系統運行激活的腳本。對于Linux來說,可以使用如下命令:
```python
$ source venv_path/bin/activate
```
Once the virtual environment is active, start a Python shell and query the Django version as follows:
只要虛擬環境一激活,你就可以啟動Python終端,然后像我這樣查詢Django的版本:
```python
$ python
>>> import django
>>> print(django.get_version())
1.5.9
```
The Django version used in this case is Version 1.5.9.
本例中使用的Django版本是 1.5.9.
Alternatively, you can run the manage.py script in the project to get a similar output:
可選擇的是,你可以在項目中運行腳本 manage.py 以獲取類似的輸出內容:
```python
$ python manage.py --version
1.5.9
```
However, this option would not be available if the legacy project source snapshot was sent to you in an undeployed form. If the virtual environment (and packages) was also included, then you can easily locate the version number (in the form of a tuple) in the `__init__.py` file of the Django directory. For exa?mple?:
不過呢,要是在未部署的情況下,之前遺留的項目源碼鏡像被發送給你了,那么這個選項是不可用的。如果虛擬環境(以及包)被包括在內,那么你在Django目錄中的`__init__.py`文件簡單地找到版本號(以元組地形式出現)。例如:
```python
$ cd envs/foo_env/lib/python2.7/site-packages/django
$ cat __init__.py
VERSION = (1, 5, 9, 'final', 0)
...
```
If all these methods fail, then you will need to go through the release notes of
the past Django versions to deter?ine the identifiable changes ?form exa?ple, the AUTH_PROFILE_MODULE setting was deprecated since Version 1.5) and match them to your legacy code. Once you pinpoint the correct Django version, then you can move on to analyzing the code.
如果,所有這些方法都不管用,那么你需要
## 文件都放在哪里了?這可不是PHP啊
最大的一個困難點是用到的場合,特別是如果你來自PHP或者APS.NET的世界,即,源文件并不位于web服務器的文檔根目錄,而目錄卻通常命名為`wwwroot`或者`public_html`。此外,代碼的目錄結構和網站的URL結構之間并沒有直接的關系。
In fact, you will find that your Django website's source code is stored in an obscure path such as /opt/webapps/my-django-app. Why is this? ??ong ?any good reasons, it is often ?ore secure to ?ove your confidential data outside your public webroot. This way, a web crawler would not be able to accidentally stumble into your source code directory.
實際上,你會發現自己的Django站點源碼被存儲在了一個使人難以理解的的路徑中,比如,/opt/webapps/my-django-app。為什么會這樣?
As you would read in the Chapter 11, Production-ready the location of the source code can be found by exa?ining your web server's configuration file. Here, you will find either the environment variable DJANGO_SETTINGS_MODULE being set to the module's path, or it will pass on the request to a W??I server that will be configured to point to your project.wsgi file.
和你之前在第十一章讀到的那樣,
## Starting with urls.py
Even if you have access to the entire source code of a Django site, figuring out how it works across various apps can be daunting. It is often best to start from the root urls.py URLconf file since it is literally a ?ap that ties every request to the respective views.
即使你訪問了整個Django站點的源代碼,要搞清楚url如何與多個應用交互是令人畏懼的。
With normal Python programs, I often start reading from the start of its execution—say, from the top-level main module or wherever the __main__ check idiom starts. In the case of Django applications, I usually start with urls.py since it is easier to follow the ?ow of execution based on various URL patterns a site has.
In Linux, you can use the following find command to locate the settings.py file and the corresponding line specifying the root urls.py:
```python
$ find . -iname settings.py -exec grep -H 'ROOT_URLCONF' {} \;
./projectname/settings.py:ROOT_URLCONF = 'projectname.urls'
$ ls projectname/urls.py
projectname/urls.py
```
Jumping around the code
Reading code sometimes feels like browsing the web without the hyperlinks. When you encounter a function or variable defined elsewhere, then you will need to ju?p to the file that contains that definition. ?o?e ID?s can do this auto?atically for you as long as you tell it which files to track as part of the project.
If you use ??Emacs or ?Vim? instead, then you can create a T???AGS file to quickly navigate between files. ?Go to the project root and run a tool called *Exuberant Ctags* as follows:
如果你使用的是Emacs或者Vim,那么你可以創建一個TAG文件來快速地在多個文件之間進行瀏覽。如下,切換目錄到根目錄,然后運行叫做*Exuberant Ctags*的工具:
```python
find . -iname "*.py" -print | etags -
```
This creates a file called T???AGS that contains the location infor?ation, where every syntactic unit such as classes and functions are defined. In ??Emacs, you can find the definition of the tag, where your cursor ?or point as it called in ??acs? is at using the `M-.` command.
While using a tag file is extre?ely fast for large code bases, it is quite basic and is not aware of a virtual environ?ent ?where ?ost definitions ?ight be located?. ?n excellent alternative is to use the elpy package in ??acs. It can be configured to detect a virtual environ?ent. ?u?ping to a definition of a syntactic ele?ent is using the same M-. co??and. However, the search is not restricted to the tag file. ?o, you can even ju?p to a class definition within the Django source code sea?lessly.
## Understanding the code base
It is quite rare to find legacy code with good documentation. Even if you do, the documentation might be out of sync with the code in subtle ways that can lead to further issues. Often, the best guide to understand the application's functionality is the executable test cases and the code itself.
The official Django docu?entation has been organi?ed by versions at https://docs. djangoproject.com. On any page, you can quickly switch to the corresponding page in the previous versions of Django with a selector on the bottom right-hand section of the page:
圖片:略
In the same way, documentation for any Django package hosted on readthedocs. org can also be traced back to its previous versions. For example, you can select the documentation of django-braces all the way back to v1.0.0 by clicking on the selector on the bottom left-hand section of the page:
圖片:略
## Creating the big picture 描繪宏偉藍圖
Most people find it easier to understand an application if you show them a high-level diagram. While this is ideally created by someone who understands the workings of the application, there are tools that can create very helpful high-level depiction of a Django application.
很多人發現如果你對他們展示一個高級圖表,那么他們會覺得更容易理解一個應用。而這個圖表,理論上是由那些理解應用工作流程的人所創建,有很多工具可以創建非常富有幫助的對Django應用的高級描述。
A graphical overview of all models in your apps can be generated by
the graph_models management command, which is provided by the django-command-extensions package. As shown in the following diagram, the model classes and their relationships can be understood at a glance:
應用里的全部模型的圖形化的概覽都可以通過管理命令graph_models生成,它通過包django-command-extensions實現。如下圖所示,模型類以及這些模型之間的關系都可以一目了然:
圖片:略
Model classes used in the SuperBook project connected by arrows indicating their relationships
This visualization is actually created using PyGraphviz. This can get really large
for projects of even medium complexity. Hence, it might be easier if the applications are logically grouped and visualized separately.
實際上,可視化是使用PyGraphviz創建的。這張圖可以變得很大,即使面對的時中型的復雜項目。因此,如果應用在邏輯上組織一起,而在視覺上獨立的,那么也能夠讓人們的理解輕松些。
>#####PyGraphviz Installation and Usage
If you find the installation of Py?raphvi? challenging, then don't worry, you are not alone. Recently, I faced numerous issues while installing on Ubuntu, starting from Python 3 incompatibility to incomplete documentation. To save your time, I have listed the steps that worked for me to reach a working setup.
>On Ubuntu, you will need the following packages installed to install PyGraphviz:
```shell
$ sudo apt-get install python3.4-dev graphviz libgraphviz-dev pkg-config
```
>Now activate your virtual environment and run pip to install the development version of PyGraphviz directly from GitHub, which supports Python 3:
```shell
$ pip install git+http://github.com/pygraphviz/pygraphviz.git#egg=pygraphviz
```
>Next, install django-extensions and add it to your INSTALLED_ APPS. Now, you are all set.
>Here is a sa?ple usage to create a ?raph?i? dot file for just two apps and to convert it to a PNG image for viewing:
```shell
$ python manage.py graph_models app1 app2 > models.dot
$ dot -Tpng models.dot -o models.png
```
## 增量變更還是完全重寫?
Often, you would be handed over legacy code by the application owners in the earnest hope that most of it can be used right away or after a couple of minor tweaks. However, reading and understanding a huge and often outdated code base is not an easy job. Unsurprisingly, most programmers prefer to work on greenfield develop?ent.
In the best case, the legacy code ought to be easily testable, well documented,
and ?exible to work in ?odern environ?ents so that you can start ?aking incremental changes in no time. In the worst case, you might recommend discarding the existing code and go for a full rewrite. Or, as it is commonly decided, the short-term approach would be to keep making incremental changes, and a parallel long-term effort might be underway for a complete reimplementation.
A general rule of thumb to follow while taking such decisions is—if the cost of rewriting the application and maintaining the application is lower than the cost of maintaining the old application over time, then it is recommended to go for a rewrite. Care must be taken to account for all the factors, such as time taken to get new programmers up to speed, the cost of maintaining outdated hardware, and so on.
Sometimes, the complexity of the application domain becomes a huge barrier against a rewrite, since a lot of knowledge learnt in the process of building the older code gets lost. Often, this dependency on the legacy code is a sign of poor design in the application like failing to externalize the business rules from the application logic.
The worst form of a rewrite you can probably undertake is a conversion, or a mechanical translation from one language to another without taking any advantage of the existing best practices. In other words, you lost the opportunity to modernize the code base by removing years of cruft.
Code should be seen as a liability not an asset. As counter-intuitive as it might sound, if you can achieve your business goals with a lesser amount of code, you have dramatically increased your productivity. Having less code to test, debug, and maintain can not only reduce ongoing costs but also make your organization ?ore agile and ?exible to change.
>Code is a liability not an asset. Less code is more maintainable.
Irrespective of whether you are adding features or trimming your code, you must not touch your working legacy code without tests in place.
## 做出任何的改變之前都應該做測試
In the book Working Effectively with Legacy Code, Michael Feathers defines legacy code as, simply, code without tests. He elaborates that with tests one can easily ?odify the behavior of the code quickly and verifiably. In the absence of tests, it is impossible to gauge if the change made the code better or worse.
?ften, we do not know enough about legacy code to confidently write a test. Michael recommends writing tests that preserve and document the existing behavior, which are called characterization tests.
Unlike the usual approach of writing tests, while writing a characterization test, you will first write a failing test with a du??y output, say X, because you don't know what to expect. When the test harness fails with an error, such as "Expected output X but got Y", then you will change your test to expect Y. So, now the test will pass, and it becomes a record of the code's existing behavior.
Note that we might record buggy behavior as well. After all, this is unfamiliar code. Nevertheless, writing such tests are necessary before we start changing the code. Later, when we know the specifications and code better, we can fix these bugs and update our tests (not necessarily in that order).
## 編寫測試的具體步驟
Writing tests before changing the code is similar to erecting scaffoldings before the restoration of an old building. It provides a structural framework that helps you confidently undertake repairs.
You ?ight want to approach this process in a stepwise ?anner as follows?:
1. Identify the area you need to make changes to. Write characterization tests focusing on this area until you have satisfactorily captured its behavior.
2. Look at the changes you need to ?ake and write specific test cases for those. Prefer smaller unit tests to larger and slower integration tests.
3. Introduce incremental changes and test in lockstep. If tests break, then try to analyze whether it was expected. Don't be afraid to break even the characterization tests if that behavior is something that was intended
to change.
If you have a good set of tests around your code, then you can quickly find the effect of changing your code.
換句話來說,如果你決定通過丟掉自己的代碼而不是數據來重寫,那么Django對于此事是頗有幫助的。
## 舊版本的數據庫
There is an entire section on legacy databases in Django documentation and rightly so, as you will run into them many times. Data is more important than code, and databases are the repositories of data in most enterprises.
You can ?oderni?e a legacy application written in other languages or frameworks by importing their database structure into Django. As an immediate advantage, you can use the Django admin interface to view and change your legacy data.
Django makes this easy with the inspectdb management command, which looks as follows:
```python
$ python manage.py inspectdb > models.py
```
當你的設置文件使用舊版本的數據庫配置過了,這個命令可以自動地生成應用到模型文件的Python代碼。
如果你正在把該方法集成到就舊版本數據庫,這里給你一些最佳實踐建議:
- 預先了解Django ORM的限制。目前,多個列(合成)主鍵和非關系型數據庫是不支持的。
- 不要要記手動清理生成的模型,例如,移除Django自動創建的ID冗余字段。
- 外鍵關系可能必須手工定義。在某些數據庫中自動生成的模型會包含使用`_id`作為前綴的整數字段。
- 將模型組織到獨立到應用中。之后,在對應到文件夾中就可以輕松的添加視圖,表單和測試了。
- 記住在舊版本的數據庫中運行遷移命令將創建Django的管理表(`django_*` 和 `auth_*`)。
理想的情況中,自動創建的模型會立即運行起來的,不過在實際情況中,它的運行帶來的是很多的嘗試和錯誤。有時候,Django推斷的數據類型并不合乎你的期望。另外的情況是,你想要對模型添加`unique_together`這樣對元信息。
Eventually, you should be able to see all the data that was locked inside that aging PHP application in your familiar Django admin interface. I am sure this will bring a smile to your face.
## 總結
In this chapter, we looked at various techniques to understand legacy code. Reading code is often an underrated skill. But rather than reinventing the wheel, we need
to judiciously reuse good working code whenever possible. In this chapter and the rest of the book, we emphasize the importance of writing test cases as an integral part of coding.
本章,我們瀏覽了多種技術以理解舊版本的代碼。閱讀代碼是一個經常被低估的技能。不過,相比較于重復發明輪子,我們需要決斷重復使用。本書的剩下章節,我們強調的是編寫測試案例作為代碼完整性的一部分。
In the next chapter, we will talk about writing test cases and the often frustrating task of debugging that follows.
下一章,我們要談論編寫測試用例,以及接下來的經常讓人沮喪的調試任務。