Opened 5 years ago
Last modified 14 months ago
#13169 new defect
GitNode.get_content() read entire of a file into memory even if it is huge
Reported by: | Jun Omae | Owned by: | |
---|---|---|---|
Priority: | normal | Milestone: | next-stable-1.6.x |
Component: | plugin/git | Version: | |
Severity: | minor | Keywords: | |
Cc: | Branch: | ||
Release Notes: | |||
API Changes: | |||
Internal Changes: |
Description
I think we should use TemporaryFile
/SpooledTemporaryFile
rather than cStringIO
.
See also:
- GitNode.get_content()
- get_file() -
cStringIO
is used - cat_file() - entire of the content into memory by
Popen.stdout.read(size)
Attachments (0)
Change History (10)
comment:1 by , 5 years ago
Severity: | normal → minor |
---|
follow-up: 3 comment:2 by , 5 years ago
Two minor suggestions in [fd8e27b47/rjollos.git]. We can skip setting default parameter stdout=PIPE
, like in r15074. Also, I noticed that changes will need some rebasing on trunk due removal of _close_proc_pipes
in r15077. The function could be restored to avoid duplicate code.
Looking at introduction of GitProcStdout
and thinking about r15077, would it make sense to introduce a wrapper for Popen
that cleans up on __exit__
and __del__
? GitProcStdout
does that while providing a file-like interface for stdout
.
I experimented with changes in [f3cab6f42/rjollos.git]. I had to keep GitProcStdout
because we need to hold a reference to the process when returning from get_file
to avoid I/O on a closed file. After the changes, do we need to keep reusing the same process in Storage.cat_file
, or could we also implement [38381686f/rjollos.git]?
comment:3 by , 5 years ago
Replying to Ryan J Ollos:
Two minor suggestions in [fd8e27b47/rjollos.git]. We can skip setting default parameter
stdout=PIPE
, like in r15074. Also, I noticed that changes will need some rebasing on trunk due removal of_close_proc_pipes
in r15077. The function could be restored to avoid duplicate code.
The changes looks good to me. In addition, GitProcStdout.close
method is restored in [632e91a44/jomae.git]. The close
method is invoked by content_closing()
after r16591 (#12977).
Looking at introduction of
GitProcStdout
and thinking about r15077, would it make sense to introduce a wrapper forPopen
that cleans up on__exit__
and__del__
?GitProcStdout
does that while providing a file-like interface forstdout
.
Backporting r15077 to 1.0-stable sounds good, however it is not recommended if we will release 1.0.18 soon.
I experimented with changes in [f3cab6f42/rjollos.git]. I had to keep
GitProcStdout
because we need to hold a reference to the process when returning fromget_file
to avoid I/O on a closed file. After the changes, do we need to keep reusing the same process inStorage.cat_file
, or could we also implement [38381686f/rjollos.git]?
After the changes, git process is executed when each GitNode.get_content_length()
is invoked, for example, browser view would be very slow when a directory has many files….
comment:4 by , 5 years ago
Thanks for giving feedback on those changes. I agree with deferring my additional changes. I think it would be good to commit for 1.0.18: your original changes with [fd8e27b47/rjollos.git] and the class rename in [632e91a44/jomae.git]. I'll make a new ticket for the GitCmd
class introduction in milestone 1.0.19.
follow-up: 6 comment:5 by , 5 years ago
It occurred to me that we could just implement a file interface for GitCmd
: [22f5cf676/rjollos.git].
Looking at GitCore.__execute
, we could have GitCmd
log errors: [ee852cfea/rjollos.git]. The documentation suggests it is better to use communicate
rather than stdout.read
, stderr.read
, stdin.write
. However, I'm unsure of the best way to log stderr
when size
is not None
.
comment:6 by , 5 years ago
Looking at
GitCore.__execute
, we could haveGitCmd
log errors: [ee852cfea/rjollos.git]. The documentation suggests it is better to usecommunicate
rather thanstdout.read
,stderr.read
,stdin.write
. However, I'm unsure of the best way to logstderr
whensize
is notNone
.
Two solutions:
- Use non-blocking read both stdout and stderr to avoid deadlocks. However, the programming is too complex.
- Use temporary file for stderr to write output without read operations. Log content of the temporary file when the command exits if needed: [d18870314/jomae.git]
comment:7 by , 5 years ago
Milestone: | 1.0.18 → 1.0.19 |
---|
comment:8 by , 5 years ago
Milestone: | 1.0.19 → 1.0.20 |
---|
comment:9 by , 5 years ago
Milestone: | 1.0.20 → next-stable-1.4.x |
---|
Reconsidering,
GitNode.get_content()
cannot return until the entire file is written to a temporary file. If the file is large, I think the behavior is slow. In the following changes, the method uses stdout ofgit cat-file blob {sha}
without reading the content when size of the content is large than 32 MB.Proposed changes in [8ec7797e6/jomae.git] (jomae.git@t13169+1.0).