(base) server@Server:~/llama.cpp$ ./quantize ./models/Phi-3-mini-4k-instruct/ggml-model-f16.gguf ./models/Phi-3-mini-4k-instruct/ggml-model-Q4_K_M.gguf Q4_K_M
main: build = 913 (eb542d3)
main: quantizing './models/Phi-3-mini-4k-instruct/ggml-model-f16.gguf' to './models/Phi-3-mini-4k-instruct/ggml-model--Q4_K_M.gguf' as Q4_K_M
llama.cpp: loading model from ./models/Phi-3-mini-4k-instruct/ggml-model-f16.gguf
llama_model_quantize: failed to quantize: unknown (magic, version) combination: 46554747, 00000003; is this really a GGML file?
main: failed to quantize model from './models/Phi-3-mini-4k-instruct/ggml-model-f16.gguf'
양자화 하는거 따라해보고 있는데 에러가 뜨고 양자화가 안되네요
베이스모델을 f16 gguf까지는 변화이 됩니다만 (INFO:hf-to-gguf:Model successfully exported to 'models/Phi-3-mini-4k-instruct/ggml-model-f16.gguf')
python3 convert.py ./models/Phi-3-mini-4k-instruct
양자화할때 에러가 뜹니다